Develop Application with Open Fabrics Yufei Ren Tan Li.
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
2
Transcript of Develop Application with Open Fabrics Yufei Ren Tan Li.
Develop Application withOpen Fabrics
Yufei RenTan Li
Agenda
• RDMA concept review• Modules in OFED-1.5.1 userspace• librdmacm(RDMA Communication)• libibverbs(InfiniBand)• Installation OFED on FedoraCore12/RHEL5
• about Lustre & future work
RDMA ?
• RDMA: networking technologies that have a software interface with three features:– Remote DMA (RDMA write, RDMA read)– Asynchronous work queues (as Tan has illustrated)– Kernel bypass
RDMA - Kernel bypass
non-iWARP iWARP
RDMA Verbs and Objects
• Not quite an API
• Abstract definition of functionality
• “Resources(Objects) operated on by Verbs(functions).”– such as Queue Pair/Completion Queue operated on by
Create/Destroy.– rdma_create_qp()/rdma_destroy_qp() in
librdmacm/include/rdma_cma.h
• Maybe considered as Object and Method in OO language(C++/Java).
What is OpenFabrics
• include:– Kernel-level drivers– Channel-oriented RDMA bypasses– Application Program Interface(API)
• for:– Parallel Message Passing(MPI)– Socket Data Exchage(SDP)– File System(Lustre)
Modules in OFED-1.5.1 userspace
• librdmacm: Linux library to abstract connection setup.
• libibverbs: a library that allows programs to use RDMA "verbs" for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace.
• device-specific drivers:– IB: libmthca, libmlx4, libipathverbs,
libehca– iWARP: libcxgb3, libamso
librdmacm
• Linux library to abstract connection setup. Same code runs on IB and iWARP fabric technologies.
• Mimics TCP socket model. (socket, connect, bind, listen, accept, getaddrinfo, etc). cm_id is socket analog.
• IP addressing can be used on iWARP, even InfiniBand (IPoIB).• Additional address/route resolution steps.–rdma_resolve_addr()–rdma_resolve_route()
• Events reported through “channels”- rdma_create_event_channel()- rdma_get_cm_channel()- rdma_ack_cm_channel()
An example of ftp via OpenFabrics
Put
Get
RDMA FTP Client
RDMA FTP Serverrdma_getaddrinfo()
rdma_create_ep()
rdma_listen()
rdma_accept()
blocks until connection from
client
rdma_get_recv_comp()
rdma_post_send()
rdma_connect()
rdma_post_send()
rdma_get_recv_comp()
rdma_disconnect()
connection establishment
data
data
rdma_getaddrinfo()
rdma_create_ep()
rdma_deg_mr()
rdma_destroy_ep()
rdma_disconnect()
rdma_deg_mr()
rdma_destroy_ep()
FTPProtocol
FS
librdmacm – initialization
• rdma_create_event_channel()– Open a channel used to report communication events.
Asynchronous events are reported to users through event channels. Each event channel maps to a file descriptor.
• rdma_create_id()– Allocate a communication identifier. Creates an
identifier that is used to track communication information. Just as socket_fd.
librdmacm – active connection steps• rdma_resolve_addr()
– Resolve destination and optional source addresses from IP addresses to an RDMA address. If successful, the specified rdma_cm_id will be bound to a local device. getaddrinfo() in socket API.
• rdma_resolve_route()– Resolve the route information needed to establish a
connection. This is called on the client side of a connection after calling rdma_resolve_addr, but before calling rdma_connect.
• rdma_connect()– Initiate an active connection request.
librdmacm – passive connection steps• rdma_bind_addr()
– Bind an RDMA identifier to a source address.
• rdma_listen()– Listen for incoming connection requests.
• rdma_accept()– Called to accept a connection request.
librdmacm – data transfer• rdma_post_send()
– opcode == IBV_WR_RDMA_READ– RDMA read
• rdma_post_send()– Opcode == IBV_WR_RDMA_WRITE– RDMA write.
• librdmacm/example/rping.c
librdmacm – Abbreviation•QP: queue pair•CQ: completion queue•WQ: working queue•MR: memory region•PD: protection domain•SRQ: shared receive queue•AH: address handle•MW: memory window
libibverbs• libibverbs is a library that allows programs to use
RDMA "verbs" for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace.
• Linux implementation of RDMA verbs.• Loads device-specific drivers for hardware
support.• IB: libmthca, libmlx4, libipathverbs, libehca• iWARP: libcxgb3, libamso
Install OFED on FedoraCore12
http://docs.google.com/Doc?docid=0AYXBBIFwi6bqZGY5cm1jeGJfNjAzc2N6eGt2Mw&hl=en
lustre
• File system clients• Object Storage Servers(OSS): provide file I/O
services• Metadata Servers(MDS): manage the names and
directories in the file system
Lustre – cont’
Future work
• OpenFabrics run example on netqos04.• Configure lustre on netqos04. Real cluster need
more machines. LPAR?• OpenFabrics sources and RFC5040/5041/5044.