Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from...
-
Upload
jemima-bishop -
Category
Documents
-
view
218 -
download
0
Transcript of Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from...
1
Network Stack Spe-cialization
for PerformancePresented by Donghwi Kim
(Some figures are brought from the paper)
2
Objective
• The authors tried to show upper bound of network application performance by specialization(Actually, not only a network stack but also an ap-plication’s implementation is specialized)
• A special kind of applications is chosen(Serves same content to multiple users)• Sandstorm: A Web server serves static webpage• Namestorm: A DNS server
3
Key of performance
• A complete zero-copy stack• Aggressive amortization• Pre-packetized data• Batching to mitigate system-call overhead
• Synchronous, clocked from received packets• Improves cache locality• Minimize the latency of sending the first packet of re-
sponse
• Intel’s DDIO
4
Network stack
• libnmio: Data-movement and event-notification primitives• libeth: A lightweight Eth-
ernet-layer• libtcpip: An optimized
TCP/IP layer• libudpip: A UDP/IP layer
5
A complete zero-copy stack• Receiving a packet• Done by DMA
• Transmitting a packet• Aggressive amortization
• Modify one of prepared a copy of packet and use DMA• The modifications are performed in a single pass to use
CPU’s L1 cache efficiently
6
A complete zero-copy stack• pre-copy method• maintain more than one copy of each packet• potential to thrash CPU’s L3 cache
• memcpy method• maintain one long-term copy and create ephemeral
copies• more work should be done
7
How the optimization works?
• Batching increases TCP RTT• Amortizing reduces per-request processing
8
Intel’s DDIO
• Direct Data I/O
• When transmission• Pull data from the L3 cache without a detour through
system memory
• When reception• DMA can place data in processor’s L3 cache
9
Evaluation
10
Evaluation
11
Evaluation
12
DDIO
• Pre-copy case: DDIO pulls untouched incoming data into the cache, so the file data cannot be cached• Memcopy case: CPU loads file data into the cache
13
Discussion
• mTCP vs. Sandstorm
14
Discussion
• mTCP• Provides UNIX-like socket programming interface• mTCP provides fairness
• TCP of Sandstorm• Higher level stack does not wrap lower level stack
• Each stack is a stand-alone service• For example, an application interacts directly with libnmio
• Amortization, no-queueing, inaccurate timer cannot guarantee correctness• Limited applications