Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell...

31
Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions References ® Implementing TCP Sockets over RDMA Patrick MacArthur <[email protected]> Robert D. Russell <[email protected]> Department of Computer Science University of New Hampshire Durham, NH 03824-3591, USA 2nd Annual 2014 InfiniBand User Group Workshop April 3, 2014 2:30pm 1 / 31

Transcript of Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell...

Page 1: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Implementing TCP Sockets over RDMA

Patrick MacArthur <[email protected]>Robert D. Russell <[email protected]>

Department of Computer ScienceUniversity of New HampshireDurham, NH 03824-3591, USA

2nd Annual 2014 InfiniBand User Group WorkshopApril 3, 2014 2:30pm

1 / 31

Page 2: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Acknowledgements

The authors would like to thank the University of NewHampshire InterOperability Laboratory for the use of theirRDMA cluster for the development, maintenance, and testingof UNH EXS. We would also like to thank the UNH-IOL andIxia for the use of an Anue network emulator for performancetesting.This material is based upon work supported by the NationalScience Foundation under Grant No. OCI-1127228 and underthe National Science Foundation Graduate Research FellowshipProgram under award number DGE-0913620.

2 / 31

Page 3: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Outline

1 Background

2 RSockets

3 UNH EXS

4 Performance Evaluation

5 Conclusions

6 References

3 / 31

Page 4: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Differences Between RDMA and TCP Sockets

RDMA

“Kernel bypass”: data transfers with no OS involvement

“Zero-copy”: Direct virtual memory to virtual memorytransfers

Message-oriented

Asynchronous programming interface

TCP Sockets

Kernel involvement in all data transfers

Buffered in kernel-space on both sides of connection

Byte-stream oriented protocol

Synchronous programming interface

4 / 31

Page 5: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

TCP Sockets Data Transfer

sender receiver

User recvbuffer

User sendbuffer

recv

IntermediateReceiveBuffer

(Kernel)IntermediateSendBuffer

(Kernel)

send

copy

copy

TCP

5 / 31

Page 6: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Message vs. Byte Stream Semantics

Message Transfer (RDMA, UDP, SOCK SEQPACKET)

sender receiver

He l l o

Wo r l d He l l o

W

Byte Stream Transfer (TCP/IP)

sender receiver

He l l o

Wo r l d He l l o

Wo r l d

O k

O k

6 / 31

Page 7: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Issue: O NONBLOCK is not asynchronous

Try-and-fail

recv: if no data in buffer, fail immediately with EAGAINsend: if buffer is full, fail immediately with EAGAIN

POSIX poll/select notify when an operation can start, notwhen operations complete

recv: poll/select returns when data in buffersend: poll/select returns when empty space in buffer

This is incompatible with RDMA semantics

RDMA: send or recv queued, call returns immediately,proceeds in background

7 / 31

Page 8: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Issue: Implementing “Zero-copy”

Memory to be used for RDMA must (currently) beregistered

Existing sockets programs do not register memory to beused in I/O operations

May use any malloc’d/stack variableMay be freed at any timeSockets programmers assume memory can be reused assoon as send() returns

Not respecting adapter’s natural alignment can causesevere performance degradation, especially on FDRadapters

8 / 31

Page 9: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Prior Implementations of Sockets over RDMA

Sockets Direct Protocol (SDP) (defined by InfiniBandspecification [InfiniBand 2011])

BCopy (buffering on both sides)ZCopy (zero-copy, send() blocks) [Goldenberg 2005]AZ-SDP (asynchronous, zero-copy, segfault handler)[Balaji 2006]

uStream (asynchronous but not zero-copy) [Lin 2009]

9 / 31

Page 10: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Current Implementations of Sockets over RDMA

SMC-R (100% compatibility with TCP/IP and sockets)

rsockets (high-performance sockets replacement)[Hefty 2012]

UNH EXS (extended sockets)[ISC 2005, Russell 2009, MacArthur 2014]

10 / 31

Page 11: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Outline

1 Background

2 RSockets

3 UNH EXS

4 Performance Evaluation

5 Conclusions

6 References

11 / 31

Page 12: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

RSockets

Goal: compatibility with sockets, high performance

Built on RDMA, so kernel bypass for data transfer path

Buffer copies on both sides of connection

Supports SOCK STREAM (TCP-like) and SOCK DGRAM(UDP-like) modes

API is currently synchronous only

12 / 31

Page 13: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

RSocket Data Transfer with rsend/rrecv

sender receiver

User recvbuffer

User sendbuffer

rrecv

IntermediateReceiveBuffer

IntermediateSendBuffer

rsend

copy

copy

RDMA WRITE WITH IMM

all in user space

13 / 31

Page 14: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

“Zero-copy” with rsockets

Can perform zero-copy using riomap and riowrite

riomap maps a virtual memory region to an offset

riowrite directly transfers data to iomap’d buffer identifiedby offset

Example

/* *************** at receiver ***************** */

off_t target_offset = riomap(fd, target_buf , len , PROT_WRITE , -1);

rsend(fd, &target_offset , sizeof(target_offset), 0);

rrecv(fd, empty , sizeof(empty), MSG_WAITALL );

/* *************** at sender ***************** */

off_t target_offset;

rrecv(fd, &target_offset , sizeof(target_offset), MSG_WAITALL );

/* write big buffer to server */

riowrite(fd, local_buf , length , target_offset , 0);

/* notify recipient of completion */

rsend(fd, &empty , sizeof(empty), 0);

14 / 31

Page 15: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Outline

1 Background

2 RSockets

3 UNH EXS

4 Performance Evaluation

5 Conclusions

6 References

15 / 31

Page 16: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

UNH EXS (Extended Sockets)

Based on ES-API (Extended Sockets API) published bythe Open Group [ISC 2005]

Extensions to sockets API to provide asynchronous,zero-copy transfers

Memory registration (exs mregister(), exs mderegister())Event queues for completion of asynchronous events(exs qcreate(), exs qdequeue(), exs qdelete())Asynchronous operations (exs send(), exs recv(),exs accept(), exs connect())

UNH EXS supports SOCK SEQPACKET (reliablemessage-oriented) and SOCK STREAM (reliablestream-oriented) modes

16 / 31

Page 17: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

UNH EXS Programming

Example

Example asynchronous send operation

exs_mhandle_t mh = exs_mregister(buf , bufsize , EXS_ACCESS_READ );

exs_qhandle_t qh = exs_qcreate (10);

if (exs_send(fd, buf , bufsize , 0, mh , 0, qh) < 0) {

perror("Could not start send operation");

/* bail out */

}

/* do work in parallel with data transfer */

exs_event_t ev;

if (exs_qdequeue(qh , &ev, 1, NULL) < 0) {

perror("Could not get send completion event");

/* bail out */

}

fprintf(stderr , "Send of %d/%d bytes complete with errno=%d\n",

bufsize , ev.exs_evt_union.exs_evt_xfer.exs_evt_length ,

ev.exs_evt_errno );

17 / 31

Page 18: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

UNH EXS Protocol

Direct Transfer (SOCK STREAM and SOCK SEQPACKET)

ADVERT

DIRECT

sender receiver

User recvbuffer

User sendbuffer

exs_send

exs_recv

Indirect Transfer (SOCK STREAM only)

INDIRECT

sender receiver

User recvbuffer

User sendbuffer

exs_send

exs_recv

IntermediateReceiveBuffer

COPY

18 / 31

Page 19: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Outline

1 Background

2 RSockets

3 UNH EXS

4 Performance Evaluation

5 Conclusions

6 References

19 / 31

Page 20: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Performance Evaluation

Comparison of TCP, rsockets, and EXS using rstream andriostream tools

rsockets rrecv()/rsend(): rstreamrsockets riomap()/riowrite(): riostreamTCP: rstreamEXS: rstream-like utility modified to take advantage ofasynchronous operations

No optimization done for rsocket and TCP cases

Systems used: Intel Xeon 2.40 GHz E5-2609 CPUs, 64 GBRAM, PCI-e Gen 3

HCAs: Mellanox ConnectX-3 FDR InfiniBand HCAsconnected via Mellanox SX6036 FDR InfiniBand switch

20 / 31

Page 21: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Throughput Comparison

0

20

40

60

80

100

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

Thro

ughput

(Gig

abit

s per

seco

nd)

Message size

streamexsrstream

riostreamrstream-tcp

21 / 31

Page 22: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

CPU Usage Comparison

0

50

100

150

200

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

CPU

usa

ge (

perc

ent)

Message size

Client CPU Utilization

streamexsrstream

riostreamrstream-tcp

0

50

100

150

200

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

CPU

usa

ge (

perc

ent)

Message size

CPU Percent Server

streamexsrstream

riostreamrstream-tcp

22 / 31

Page 23: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Latency Comparison

1

10

100

1000

32 B 128 B 512 B 2 KiB 8 KiB 32 KiB 128 KiB 512 KiB

Tim

e p

er

mess

age (

mic

rose

conds)

Message size

streamexsrstream

riostreamrstream-tcp

23 / 31

Page 24: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

UNH EXS SOCK SEQPACKET vs.SOCK STREAM

0

20

40

60

80

100

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

Thro

ughput

(Gig

abit

s per

seco

nd)

Message size

pumpexs-seqpacketpumpexs-stream

0

50

100

150

200

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

CPU

usa

ge (

perc

ent)

Message size

Client CPU Utilization

pumpexs-seqpacketpumpexs-stream

1

10

100

1000

10000

100000

1e+06

1e+07

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

Tim

e p

er

mess

age (

mic

rose

conds)

Message size

pumpexs-seqpacketpumpexs-stream

0

50

100

150

200

512 B 8 KiB 128 KiB 2 MiB 32 MiB 512 MiB

CPU

usa

ge (

perc

ent)

Message size

Server CPU Utilization

pumpexs-seqpacketpumpexs-stream

24 / 31

Page 25: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Outline

1 Background

2 RSockets

3 UNH EXS

4 Performance Evaluation

5 Conclusions

6 References

25 / 31

Page 26: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Conclusions

Socket buffering is implicit in sockets API

Efficient zero-copy requires extensions to API

ES-API provides asynchronous operationrsockets provides riomap()/riowrite()

Message-oriented is more efficient to implement overRDMA than byte stream-oriented

26 / 31

Page 27: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

Outline

1 Background

2 RSockets

3 UNH EXS

4 Performance Evaluation

5 Conclusions

6 References

27 / 31

Page 28: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

References I

Infiniband Trade Association,“Supplement to Infiniband Architecture SpecificationVolume 1, Release 1.2.1: Annex A4: Sockets DirectProtocol (SDP),”Oct. 2011.

D. Goldenberg, M. Kagan, R. Ravid, and M. S. Tsirkin,“Zero copy sockets direct protocol overInfiniband—preliminary implementation and performanceanalysis,”in High Performance Interconnects, 2005. Proceedings.13th Symposium on. IEEE, 2005, pp. 128–137.

28 / 31

Page 29: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

References II

P. Balaji, S. Bhagvat, H.-W. Jin, and D. K. Panda,“Asynchronous zero-copy communication for synchronoussockets in the sockets direct protocol (SDP) overInfiniBand,”in Parallel and Distributed Processing Symposium, 2006.IPDPS 2006. 20th International. IEEE, 2006, pp. 8–pp.

Y. Lin, J. Han, J. Gao, and X. He,“uStream: a user-level stream protocol over InfiniBand,”in Parallel and Distributed Systems (ICPADS), 2009 15thInternational Conference on. IEEE, 2009, pp. 65–71.

29 / 31

Page 30: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

References III

S. Hefty,Rsockets.Available: https://www.openfabrics.org/

ofa-documents/doc_download/495-rsockets.html

Interconnect Software Consortium in association with theOpen Group,“Extended Sockets API (ES-API) Issue 1.0,”Jan. 2005.

——,“A General-purpose API for iWARP and InfiniBand,”in the First Workshop on Data Center Converged andVirtual Ethernet Switching (DC-CAVES), Sep. 2009.

30 / 31

Page 31: Implementing TCP Sockets over RDMA · Implementing TCP Sockets over RDMA MacArthur and Russell Background RSockets UNH EXS Performance Evaluation Conclusions ... message-oriented)

ImplementingTCP Socketsover RDMA

MacArthurand Russell

Background

RSockets

UNH EXS

PerformanceEvaluation

Conclusions

References

®

References IV

P. MacArthur and R. Russell,“An Efficient Method for Stream Semantics over RDMA,”in Proceedings of the 28th IEEE International Parallel andDistributed Processing Symposium (IPDPS 2014),May. 2014, to appear.

31 / 31