Simple Protocol for Robust Tunnel EndpointMTU Determination (sprite-mtu)
IETF 70 Routing Research Group (RRG)
Fred L. [email protected]
MTU Determination Problem
MTU=64KB
MTU=2KB
MTU=64KB
MTU=9KB
MTU=4KBMTU=64KB
Original Source(MTU=64KB)
Internet/Enterprise Network/MANET/etc.
Final Destination(EMTU_R=64KB)
EdgeNetwork
EdgeNetwork
End-to-End
TunnelNear-End
MTU=??
TunnelFar-End(EMTU_R=8KB)
Tunnel
Tunnel MTU Issues (1)
• IPv4 path MTU discovery has limitations for tunnels:• ICMPv4 “packet too big” (PTB) messages dropped by
middleboxes – result is undiagnosable black hole• PTB messages returned to the tunnel near-end (TNE) can’t be
translated into PTBs to send back to the original source• PTB messages easily forged by off-path attackers• does not work in the presence of multi-MTU subnets, i.e., last-
hop router cannot know the MTU of the tunnel far-end (TFE)
CHALLENGE: TNE CANNOT BLINDLY ADMIT BIG PACKETS INTO THE TUNNEL WITH DF=1
Tunnel MTU Issues (2)
• Unmitigated IPv4 fragmentation is harmful:• Existing TNEs have no way of knowing the Effective MTU to
Receive (EMTU_R) of the TFE• Existing TNEs have no way of knowing the reassembly
timeout value used by the TFE• Slow-path processing in fragmenting middleboxes• TNE has no way of controlling NATs that rewrite ip_id• IP fragment misassociations at TFE can cause undetected
data corruption
CHALLENGE: TNE CANNOT BLINDLY SEND BIG PACKETS INTO THE TUNNEL WITH DF=0
Goals
• Robust support for packets of various sizes• Maximize Packet Delivery Ratio• Manage fragmentation if necessary• Avoid in-the-network fragmentation• Avoid reassembly misassociations at TFE• Coexist with end-to-end MTU determination• Support larger MTUs
Solution: SPRITE-MTU
• UDP Echo service for tunnel MTU discovery• Soft state management to track tunnel parameters (per
RFC2003)• Explicit Congestion Notification for robust operation
over tunnels with small MTUs• Improves operating conditions for end-to-end path
MTU determination (RFC4821)
RESULT: DISCOVERS TUNNEL MTU AND MINIMIZES NUMBER OF FRAGMENTS PER PACKET (PREFERABLY DOWN TO 1)
Relevant Elements of Normative Specifications
• RFC2003 (IPv4-in-IPv4 Encapsulation)• Basic encapsulation/decapsulation specifications• Inner packet fragmentation when DF=0 and packet larger than
the TFE’s EMTU_R• Setting of DF• Tunnel Soft State• Sending packet while also returning PTB
• RFC4213 (IPv6-in-IPv4 Encapsulation)• Basic encapsulation/decapsulation specifications• Conceptual sending algorithm• “Configuration knob” threshold for determining when an
outer packet is fragmentable
Configuration Knob for Fragmentable Outer Packets
• Two purposes: 1) avoid TFE receive buffer overrun, 2) avoid/minimize fragmentation on the TNE->TFE path
• Below threshold, admit packets into tunnel without returning PTBs (TFE may need to reassemble)
• Above threshold, admit packet into tunnel and return PTB if packet is larger than cached MTU
• Minimums are 1280bytes for IPv6 (MUST) and 576bytes for IPv4 (SHOULD)
• May be set to larger values based on knowledge of: 1) TFE’s EMTU_R, 2) other encapsulations that may occur on the TNE->TFE path
• Ideally, push configuration knob up to 1480 (or better yet 1500) – but not always possible
Setting the Configuration Knob (Assuming ENCAPS=20)
• 1280: safest option• 1280 – ~1380: probably
safe for most paths• 1380 – 1480: safe only if
little/no additional encaps• 1480 – 1500: only safe if
path has larger-than-1500 MTU and TFE has larger-than-minimum EMTU_R
• optimizing down to the byte level not always possible
1280 …. 1500
Setting DF
• Set DF=1 in all packets larger than threshold• Set DF=1 even if TNE fragments packet before sending
into tunnel• MAY set DF=0 to increase PDR and avoid spurious
PTBs, but if so must use pacing and/or soft state feedback to manage fragmentation
Sending Big Packets into Tunnel
• If packet is no larger than the tunnel’s probed MTU (initially set to the configuration threshold) send packet into tunnel with DF=1
• If packet is larger, send packet into tunnel with DF=1 but also send PTB back to source• Sending packet increases PDR and also allows end-to-end
MTU determination (RFC4821) to determine actual MTU• Sending PTB alerts RFC4821 nodes that there *may* be an
MTU restriction
What if it Might be Fragmenting?
• Institute pacing until pathMTU to TFE is probed• If probed size is no smaller than configuration
threshold, relax pacing• If probed size is smaller than configuration threshold,
or no probes returned, synchronize soft state with TFE• Worst case: fast links with small MTUs on TNE->TFE
path (need to carefully monitor TFE’s reassembly)
Soft State Management Protocol
• TNE creates soft state and sends initial sprite to TFE using TFE’s on-link link local address as destination• TNE is asking TFE to synchronize state
• TFE sends reply using its current sprite address as source• no soft state created yet – avoid buffer attacks
• TNE sends sprite using TFE’s current sprite address as destination• TFE creates soft state; begins monitoring received packets
• TNE and TFE continuously exchange sprites while packets are actively using the tunnel
Sprite-mtu Checksum
• “sprite-mtu checksum” sums every 10th byte of the packet using the Fletcher-16 algorithm
• While synchronized, TNE includes trailing sprite-mtu checksum
• TFE checks checksum and discards packet if checksum disagrees
Explicit Congestion Notification
• TNE sets ECT(0) or ECT(1) codepoint in its sprites• When TFE detects incorrect sprite-mtu checksums, it
begins setting CE codepoint in its sprite replys• TNE institutes pacing while receiving sprite replys with
CE codepoint• TNE relaxes pacing when CE codepoint no longer set
Futures
• IEEE 802.3as Frame Expansion• larger than 1500 MTUs for 802.3 links• may allow setting configuration threshold to > 1500
• Larger EMTU_Rs for tunnel endpoints (up to 2KB)• Gigabit Ethernet 9KB jumboframes• Widespread use of sprite-mtu• Widespread use of RFC4821
TODO
• Some encapsulations dangerous with any level of outer fragmentation – e.g., Teredo (IPv6/UDP/IPv4)• NATs re-write ‘ip_id’• ‘ip_id’ collisions when multiple nodes behind NAT talk to the
same TFE• solution: “UDP Fragmentation for Teredo” (draft to be written)
• Use ICMP echo request/reply as fallback if TFE does not implement sprite-mtu (is it worth it?)