Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

17
mple Protocol for Robust Tunnel Endpoi MTU Determination (sprite-mtu) IETF 70 Routing Research Group (RRG) Fred L. Templin [email protected]

description

Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu) IETF 70 Routing Research Group (RRG) Fred L. Templin [email protected]. MTU Determination Problem. End-to-End. Final Destination (EMTU_R=64KB). Tunnel. MTU=64KB. MTU=64KB. Original Source (MTU=64KB). - PowerPoint PPT Presentation

Transcript of Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Page 1: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Simple Protocol for Robust Tunnel EndpointMTU Determination (sprite-mtu)

IETF 70 Routing Research Group (RRG)

Fred L. [email protected]

Page 2: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

MTU Determination Problem

MTU=64KB

MTU=2KB

MTU=64KB

MTU=9KB

MTU=4KBMTU=64KB

Original Source(MTU=64KB)

Internet/Enterprise Network/MANET/etc.

Final Destination(EMTU_R=64KB)

EdgeNetwork

EdgeNetwork

End-to-End

TunnelNear-End

MTU=??

TunnelFar-End(EMTU_R=8KB)

Tunnel

Page 3: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Tunnel MTU Issues (1)

• IPv4 path MTU discovery has limitations for tunnels:• ICMPv4 “packet too big” (PTB) messages dropped by

middleboxes – result is undiagnosable black hole• PTB messages returned to the tunnel near-end (TNE) can’t be

translated into PTBs to send back to the original source• PTB messages easily forged by off-path attackers• does not work in the presence of multi-MTU subnets, i.e., last-

hop router cannot know the MTU of the tunnel far-end (TFE)

CHALLENGE: TNE CANNOT BLINDLY ADMIT BIG PACKETS INTO THE TUNNEL WITH DF=1

Page 4: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Tunnel MTU Issues (2)

• Unmitigated IPv4 fragmentation is harmful:• Existing TNEs have no way of knowing the Effective MTU to

Receive (EMTU_R) of the TFE• Existing TNEs have no way of knowing the reassembly

timeout value used by the TFE• Slow-path processing in fragmenting middleboxes• TNE has no way of controlling NATs that rewrite ip_id• IP fragment misassociations at TFE can cause undetected

data corruption

CHALLENGE: TNE CANNOT BLINDLY SEND BIG PACKETS INTO THE TUNNEL WITH DF=0

Page 5: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Goals

• Robust support for packets of various sizes• Maximize Packet Delivery Ratio• Manage fragmentation if necessary• Avoid in-the-network fragmentation• Avoid reassembly misassociations at TFE• Coexist with end-to-end MTU determination• Support larger MTUs

Page 6: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Solution: SPRITE-MTU

• UDP Echo service for tunnel MTU discovery• Soft state management to track tunnel parameters (per

RFC2003)• Explicit Congestion Notification for robust operation

over tunnels with small MTUs• Improves operating conditions for end-to-end path

MTU determination (RFC4821)

RESULT: DISCOVERS TUNNEL MTU AND MINIMIZES NUMBER OF FRAGMENTS PER PACKET (PREFERABLY DOWN TO 1)

Page 7: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Relevant Elements of Normative Specifications

• RFC2003 (IPv4-in-IPv4 Encapsulation)• Basic encapsulation/decapsulation specifications• Inner packet fragmentation when DF=0 and packet larger than

the TFE’s EMTU_R• Setting of DF• Tunnel Soft State• Sending packet while also returning PTB

• RFC4213 (IPv6-in-IPv4 Encapsulation)• Basic encapsulation/decapsulation specifications• Conceptual sending algorithm• “Configuration knob” threshold for determining when an

outer packet is fragmentable

Page 8: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Configuration Knob for Fragmentable Outer Packets

• Two purposes: 1) avoid TFE receive buffer overrun, 2) avoid/minimize fragmentation on the TNE->TFE path

• Below threshold, admit packets into tunnel without returning PTBs (TFE may need to reassemble)

• Above threshold, admit packet into tunnel and return PTB if packet is larger than cached MTU

• Minimums are 1280bytes for IPv6 (MUST) and 576bytes for IPv4 (SHOULD)

• May be set to larger values based on knowledge of: 1) TFE’s EMTU_R, 2) other encapsulations that may occur on the TNE->TFE path

• Ideally, push configuration knob up to 1480 (or better yet 1500) – but not always possible

Page 9: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Setting the Configuration Knob (Assuming ENCAPS=20)

• 1280: safest option• 1280 – ~1380: probably

safe for most paths• 1380 – 1480: safe only if

little/no additional encaps• 1480 – 1500: only safe if

path has larger-than-1500 MTU and TFE has larger-than-minimum EMTU_R

• optimizing down to the byte level not always possible

1280 …. 1500

Page 10: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Setting DF

• Set DF=1 in all packets larger than threshold• Set DF=1 even if TNE fragments packet before sending

into tunnel• MAY set DF=0 to increase PDR and avoid spurious

PTBs, but if so must use pacing and/or soft state feedback to manage fragmentation

Page 11: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Sending Big Packets into Tunnel

• If packet is no larger than the tunnel’s probed MTU (initially set to the configuration threshold) send packet into tunnel with DF=1

• If packet is larger, send packet into tunnel with DF=1 but also send PTB back to source• Sending packet increases PDR and also allows end-to-end

MTU determination (RFC4821) to determine actual MTU• Sending PTB alerts RFC4821 nodes that there *may* be an

MTU restriction

Page 12: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

What if it Might be Fragmenting?

• Institute pacing until pathMTU to TFE is probed• If probed size is no smaller than configuration

threshold, relax pacing• If probed size is smaller than configuration threshold,

or no probes returned, synchronize soft state with TFE• Worst case: fast links with small MTUs on TNE->TFE

path (need to carefully monitor TFE’s reassembly)

Page 13: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Soft State Management Protocol

• TNE creates soft state and sends initial sprite to TFE using TFE’s on-link link local address as destination• TNE is asking TFE to synchronize state

• TFE sends reply using its current sprite address as source• no soft state created yet – avoid buffer attacks

• TNE sends sprite using TFE’s current sprite address as destination• TFE creates soft state; begins monitoring received packets

• TNE and TFE continuously exchange sprites while packets are actively using the tunnel

Page 14: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Sprite-mtu Checksum

• “sprite-mtu checksum” sums every 10th byte of the packet using the Fletcher-16 algorithm

• While synchronized, TNE includes trailing sprite-mtu checksum

• TFE checks checksum and discards packet if checksum disagrees

Page 15: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Explicit Congestion Notification

• TNE sets ECT(0) or ECT(1) codepoint in its sprites• When TFE detects incorrect sprite-mtu checksums, it

begins setting CE codepoint in its sprite replys• TNE institutes pacing while receiving sprite replys with

CE codepoint• TNE relaxes pacing when CE codepoint no longer set

Page 16: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

Futures

• IEEE 802.3as Frame Expansion• larger than 1500 MTUs for 802.3 links• may allow setting configuration threshold to > 1500

• Larger EMTU_Rs for tunnel endpoints (up to 2KB)• Gigabit Ethernet 9KB jumboframes• Widespread use of sprite-mtu• Widespread use of RFC4821

Page 17: Simple Protocol for Robust Tunnel Endpoint MTU Determination (sprite-mtu)

TODO

• Some encapsulations dangerous with any level of outer fragmentation – e.g., Teredo (IPv6/UDP/IPv4)• NATs re-write ‘ip_id’• ‘ip_id’ collisions when multiple nodes behind NAT talk to the

same TFE• solution: “UDP Fragmentation for Teredo” (draft to be written)

• Use ICMP echo request/reply as fallback if TFE does not implement sprite-mtu (is it worth it?)