Rdma presentation-kisti-v2
-
Upload
balmanme -
Category
Technology
-
view
493 -
download
0
Transcript of Rdma presentation-kisti-v2
RDMA for High Performance Data Movement
Network I/O operations are costly:− CPU load
− Context switching
− Memory latency Zero-copy networking
− NIC copies data directly to/from application memory
IB transport (HPC applications) iWARP (TCP stack / TOE)
RDMA model
One sided operations Get/Put semantics
Send/receive
Direct data placement RDMA Write RDMA Read
Asyschronous− Work Queue (send queue – receive queue)
− Completion Queue
RDMA Programming Model
Objects Queue Pairs (protection domain) Send queue (RDMA write, RDMA read) Receive queue Modify state Completetion queue (poll) Memory region (MR)
Functions (verbs)− IB (libmlx4) iWARP (libcxgb3)
Librdmacm (connection setup)
RDMA/iWARP
Implicit RDMA support Explicit RDMA support
iWARP − encapsulate RDMA traffic at a high level
− Use TCP stack
− Without TOE is it beneficial?
Alternative Approaches
RDMA over Converged Ethernet (RoCE)− Lightweight RDMA transport over Ethernet
Widely deployed technology Support kernel bypass OFED 1.5.1 supports RoCE
SoftRDMAs...− SoftRoCE (OFED 1.5.1 supports softRoCE)
− SoftiWARP (new TPC kernel stack)
Hidden Cost
Memory Registration− RDMA Read/Write
Connection Setup− Librdmacm
→ Bulk data movement? Asynchronous Model
− Buffer Management
Challanges in Bulk Transfer
Application Level Adjustments Request Aggregation
− Small data files
− Does FTP like transfer mechanism is appropriate for RDMA?
File System Overhead− Asynchronous Operations
Connection Caching / Multiple Connection?
Local Area / Wide Area
IB RDMA designed for local area− How does RDMA perform in Wide Area?
iWARP − No promising results - Over TCP (with TOE?)
− SoftiWARP ??? RoCE
− Isolated traffic ? / much less CPU usage
− softRoCE?
GridFTP over RDMA
XIO driver for GridFTP− Experimented using Chelsio cards (cxgb3)
− 10GE
− WAN testing in progress!
− Local area: 910MBbps – 1175MBps
− Much better than GridFTP over TCP Much less CPU load (1/2)
FTP100 – FTP over RDMA
Experimented with Mellonox Cards− Local area – 10GE
− iWARP Did not perform well compared to TCP
− No significant gain
− RoCE tests In progress (have some initial results) Limited by the disk performance Mem2mem:
− Can already saturate the 10GE link
What is Next?
Experiments RDMA model over WAN
SoftiWARP from IBM Zurich− TCP kernel stack implementing/defining RDMA
iverbs
SoftRoCE – OFED 1.5.2-rxe distribution− Multiple connections?
Transfer Applications over RDMA
Simple Client/Server:− Developing a prototype for transferring climate
dataset using RDMA protocols
− Asysnchronous memory management module
Application level tuning?− Memory regions (max/min?)
− Multiple QPs
Climate Analysis
Climate Applications are Data-Intensive
Shared data repository:− Data files needs to be downloaded for further
processing and analysis
− Data retrieval is the main bottleneck
− Multiple clients (working as VM instances) Can not depent on HW support SoftRoCE ? softiWARP
What can we do for WAN testing?
Q&A?
→ https://sdm.lbl.gov/climate100/