Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by...

20
lable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo - Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using Portals over TNet

Transcript of Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by...

Page 1: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 1 -

by Adrian Riedo - Summer 2000

High PerformanceComputing

using Portals over TNet

Page 2: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 2 -

PoT Project

The Portals over TNet Project

• Introduction

• AnalysisPortals 3TNet

• DesignCase studyConcepts

• ImplementationDevelopment SystemTNAL

• Conclusion

Page 3: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 3 -

PoT Introduction

About High Performance Computing (HPC)

• Supercomputers Superclusters• Message Passing (e.g. MPI)• Datamovement layer

OS-Bypass (avoid kernel calls)zero-copy (network bandwidth memory bandwidth) Application Bypass (large transfers w/o intervention by Appl.)

• High Performance Network

• Design rules on all levels:Scalabilitylow latency, high bandwidthPortability, platform independence (host & network)

Goal of the PoT project

• Evaluation of a first implementation of Portals on TNet

Page 4: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 4 -

PoT Analysis Portals 3

CPlan environment at Sandia National Laboratories, Albuquerque

IO(temp)

IO(temp)

IPmyrip.mod

IPmyrip.mod

Application(MPI) on Portals

Application(MPI) on Portals

Portals 2portals.mod

Portals 2portals.mod

Portals 3p3.mod

Portals 3p3.mod

RTS / CTSrtscts.modRTS / CTSrtscts.mod

Firmware (Myrinet)rtsmcp

Firmware (Myrinet)rtsmcp

Page 5: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 5 -

PoT Analysis Portals 3

Portals 3 Architecture, Network Abstraction Layer

Application(USER)

Application(USER)

Driver(OS)

Driver(OS)

Firmware(NIC)

Firmware(NIC)

AP

IL

ibra

ry

api-p30/*api-p30/*

lib-p30/*lib-p30/*

nal.cnal.c

lib_nal.c ...lib_nal.c ...

to NIC / wire

Page 6: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 6 -

PoT Analysis Portals 3

Portals 3 Structures, Addressing

meme mdmd

Portal Table

Match List

MemoryDescriptorList

EventQueue

MemoryRegion

ApplicationLibrary• Access Control Lists• Network interfaces

Page 7: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 7 -

PoT Analysis TNet

TNet environment, Swiss-Tx

Application (MPI) on FCI Application

(MPI) on FCI FCI

FCItnet.mod

FCItnet.mod

Firmware (TNet)cc_b35_lc_c35

Firmware (TNet)cc_b35_lc_c35

irq handlerkernel thread

tnet.c ...

Page 8: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 8 -

PoT Analysis TNet

TNet OSI Specification

OSI Layer Communication TypeNetwork Unicast (UC)

Multicast (MC)*Broadcast (BC)

Transport Direct Mapped (DM)Table Mapped (TM)**

* corresponds to MPI Broadcast in 1 MPI Group** specially for SMP Nodes (2, 4 Processors) BC is a particular case of MC

Process 1

Process 2

Process 3

Process 4

TM

Receiving Node

Sender

Process

TM Example (4 Processes)

DM

Layer specific Communication Types

Page 9: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 9 -

PoT Analysis TNet

TNet Address Translation

CMBCMB

VCAVCA

CMB: Contiguous Memory BlockVCA: Virtual Communication Addresspg: Page

pgpg pgpg pgpg

VCAVCA

Network virtual communication address space

Host memory address space

offset pagetable

Page 10: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 10 -

PoT Analysis TNet

TNet PCI Adapter Specification

CCCC PLXPLXLCLCGBEGBE

SDRAMSDRAM

SRAMSRAM

TX-FIFOTX-FIFO

RX-FIFORX-FIFO

PCI

CC Communication - Controller

• Lucent Orca 3T80-5 @ 31.25 MHz• Tx, Rx Unit / CRC / DMW / Flags• S(D)RAM Controllers• Rx, Tx FIFO Interfaces• PLX Controller

LC Link - Controller

• Lucent Orca 3T30-6 @ 62.5 MHz• Handshake Process • TNet retransmission protocol• Buffer: Out 3 Packets, IN 1 Packet• CRC Check

GBE GigBit - Eth

• Vitesse VSC 7211• 62.5 MHz

SDRAM

• Page Table• Index to Address Translation Table (for TM mode)• Std 16 MB or more

SRAM

• ID Validation Table• 128 x 18 Bit

Page 11: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 11 -

PoT Design case study

Portals over TNet case study

Hardware Solution• Library in hardware on NIC• big FPGA, fast RAM required for optimal solution• special design tools• long implementation time• high knowledge on Portals 3 and TNet

Software Solution• Library still in OS• usage of TNet firmware & driver• Portals NAL and TNet driver knowledge• Performance workaround: pagetable as “memory descriptor”

first learn on software level, then approach step by step

Page 12: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

IO(temp)

IO(temp)

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 12 -

PoT Design Concepts

Driver architecture & modules (PortalsMyrinet / FCITNet)

IPmyrip.mod

IPmyrip.mod

Application(MPI) on Portals

Application(MPI) on Portals

Application (MPI) on FCI Application

(MPI) on FCI FCI

Portals 2portals.mod

Portals 2portals.mod

FCItnet.mod

FCItnet.mod

Firmware (TNet)cc_b35_lc_c35

Firmware (TNet)cc_b35_lc_c35

Portals 3p3.mod

Portals 3p3.mod

RTS / CTSrtscts.modRTS / CTSrtscts.mod

Firmware (Myrinet)rtsmcp

Firmware (Myrinet)rtsmcp

myrnal forward PTL_IFACE_MYR

lib-p30

lib_myrnal

irq handlerkernel thread

tnet.c ...

Page 13: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

IO(temp)

IO(temp)

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 13 -

PoT Design Concepts

Driver architecture & modules (Portals,FCITNet)

P3oT p3ot.mod

P3oT p3ot.mod

IPmyrip.mod

IPmyrip.mod

Application(MPI) on Portals

Application(MPI) on Portals

Application (MPI) on FCI Application

(MPI) on FCI FCItnal forward PTL_IFACE_T

lib-p30

FCI

Firmware (TNet)cc_b35_lc_c35

Firmware (TNet)cc_b35_lc_c35

Portals 2portals.mod

Portals 2portals.mod

RTS / CTSrtscts.modRTS / CTSrtscts.mod

Firmware (Myrinet)rtsmcp

Firmware (Myrinet)rtsmcp

tnet.c ...lib_tnal

Page 14: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 14 -

PoT Design Concepts

Dataflow in the P3oT module (CMB & IRQ - Large Msgs using DMA )

CMBCMBP3oT

p3ot.mod P3oT

p3ot.mod

Application(MPI) on Portals

Application(MPI) on Portals

tnal forward PTL_IFACE_T

lib-p30

Firmware (TNet) - Pagetable set up for virtual CMBcc_b35_lc_c35

Firmware (TNet) - Pagetable set up for virtual CMBcc_b35_lc_c35

tnet.c ...lib_tnal

no OS Bypassno zero-copy

DMA

Page 15: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 15 -

PoT Implementation

Development System

System• 2 Alpha workstation 164LX• 21164 alpha processor, 320 MB RAM• 100 Base T Ethernet

Mini CPlant• 64 Bit / 33 MHz Myrinet• Myrinet 8 port switch

TNet• 32 Bit TNet NIC, 16 MB RAM• no switch

OS• TRU64, RedHat Linux (dualboot)

Page 16: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 16 -

PoT Implementation

TNAL (network abstraction layer for Portals over TNet)

lib-p30/*lib-p30/*

lib_tnal.c ...lib_tnal.c ...tnet.ctnet.c

FCIFCI

ioctl

CMB, gcw

TNET_ioctl in tnet.ccase TNET_PTL_DISPATCH:copy_from_user(..);lib_dispatch(..);copy_to_user(..);..break;

from lib_dispatchDo_PtlPut in wrap.c..

for the PtlPuttnal_send in lib_tnal.c..memcpy(..); //for headercopy_from_user(..); //for data.. //send data using CMB, DMA.. // remote IRQ on last packetlib_finalize(..);

Incoming messageTNET_Interrupt in tnet.c..memcpy(..); //for headerlib_parse(..”header”..)

from TNET_Interruptlib_parse in ~.cparse_put in ~.c

from parse_puttnal_rcv in tnal.ccopy_to_user(..);lib_finalize(..);

Page 17: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 17 -

PoT Implementation

Milestones

• Setting up PC / Development System• TNet Documentation / vhdl & C Sources Presentation• Getting familiar with Portals• Experimenting with mpich

• Install Myrinet, TNet, FCI, Portals on Tru64 / Linux• Experimenting with modules & test programs for TNet• Presentations / Website

• PoT Design• Writing hybrid module (P3oT)• Debugging• Benchmarking

• Report

Page 18: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

CMBCMB

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 18 -

PoT Prospects

Dataflow in the P3oT module (Pagetable - Small Msgs using PIO)

P3oT p3ot.mod

P3oT p3ot.mod

Application(MPI) on Portals

Application(MPI) on Portals

tnal forward PTL_IFACE_T

lib-p30

Firmware (TNet) - Pagetable points to Appl. Spacecc_b35_lc_c35

Firmware (TNet) - Pagetable points to Appl. Spacecc_b35_lc_c35

tnet.c ...lib_tnal

OS Bypasszero-copy

LIB

PIO dynamicPagetable

Page 19: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 19 -

PoT Conclusion

Conclusions

• CMB Software Solution: approx. 80s latency (first version)• Not the best solution, but learned a lot• Software solution profits from CRC and retransmit on card• TNAL lays basis for further research• Implementation using Pagetable & PIO for OS Bypass

Experience

• Analysis and design take a lot of time (important)• Wide knowledge needed• Kernel programming is not trivial• Long debugging time compared to applications

Page 20: Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.

Scalable Systems Lab / The University of New Mexico © Summer 2000 by Adrian Riedo- Slide 20 -

PoT Project

The PoT website at http://hpc.fribyte.ch