L2 over l3 ecnaspsulations (english)
-
Upload
motonori-shindo -
Category
Documents
-
view
1.057 -
download
0
description
Transcript of L2 over l3 ecnaspsulations (english)
© 2014 VMware Inc. All rights reserved.
L2 over L3 EncapsulationsVXLAN, NVGRE, STT, Geneve, etc.
Motonori ShindoNetwork & Security Business UnitVMwareJuly. 13, 2014
CONFIDENTIAL 2
Tunneling vs Encapsulation
• Tunneling Protocols– Signaling + Encapsulation
• Usually equips some sort of “signaling” mechanism, which manages the tunnel.• Encapsulation is another part of tunneling protocol.
– E.g. ) PPTP, L2TP, IPsec (IKE), etc.
• Encapsulations– A way of wrapping (i.e. encapsulating) something
– E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….)
• What I’m going to talk about today is “encapsulation”
• I am not going to talk about “control plane” today (though it’s very important)
CONFIDENTIAL 3
L2 over L3 encapsulations typically seen in Network Virtualization
• GRE (Generic Routing Encapsulation) *
• VXLAN (Virtual Extensible LAN)
• NVGRE (Network Virtualization using GRE)
• STT (Stateless Transport Tunneling)
* Strictly speaking GRE is not an L2 over L3 encapsulationas it can encapsulate not only L2 but also L3
CONFIDENTIAL 4
VXLAN
• Proposed by Cumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat – draft-mahalingam-dutt-dcops-vxlan-09.txt
• Extends VLAN ID (12bit) to VNI (24bit)
• Encapsulation by UDP/IP– L3 overlay
– Multipath
• Encapsulates Ethernet Frame only
• Simple so that it can be implemented by hardware
• Forming an “ecosystem”
CONFIDENTIAL 5
VXLAN Header
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|R|R|R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 6
Fabric Network
• Service Oriented Architecture
• 2 or 3 layer network to Leaf & Spine
• High density and bandwidth required
• Layer 3 ECMP
• No oversubscription
• Low and uniform delay characteristic
• Wire & configure once network
• Uniform network configuration
WAN/Internet
WAN/Internet
CONFIDENTIAL 7
Multipath Network
• Background– In order to support significant increase of East-West traffic, Fabric Network based on multipath is
getting popular
• Requisites– A given flow must traverse over the same paths
– Must have enough “entropy” to make an efficient use of fabric
CONFIDENTIAL 8
Multipath by VXLAN
VXLAN (8)UDP (8)IP (20)
Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) *
dst port = 4789src port = Hash()
Ether IP TCP Data
original packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 9
VXLAN Ecosystem
• Switch / Router– Arista, Brocade, Cisco, Cumulus, DELL, HP,
Huawei, Juniper, Open vSwitch, Pica8
• Operating System– Linux, VMware
• Appliances– A10, Citrix F5
• Testers– IXIA, Spirent
• ASIC / NIC – Broadcom, Intel (Fulcrum), Emulex, Mellanox
• Cloud Orchestrator– CloudStack, OpenStack, vCAC
Note: this is not an exhaustive list
This is a list of venders who participated in VXLAN interoperability test at INTEROP Tokyo 2014, which went all successful.
CONFIDENTIAL 10
NVGRE
• Proposed by Microsoft / Arista / Intel / Google / HP / Broadcom / Emulex– draft-sridharan-virtualization-nvgre-04.txt
• 24bit Virtual Subnet ID (VSID) and 8bit FlowID
• Encapsulation is GRE as is:– Put VSID + FlowID in Key Field
– L3 Overlay
– Multipath possible (in theory) but difficult
• Windows affinity
CONFIDENTIAL 11
NVGRE Header
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Subnet ID (VSID) | FlowID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 12
Multipath in NVGRE
GRE (8)IP (20)
Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) *
FlowID = Hash()
Ether IP TCP Data
Original Packet
Router / Switch needs to lookup the Key Field in GRE header to do an ideal multipath!
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 13
NVGRE ecosystem
• Switch / Router– Huawei
– Arista and Brocade claim they are going to support but product hasn’t come out yet??
• Operating System– Microsoft (Windows Server 2012 R2)
• Appliances– F5
• ASIC / NIC – Emulex Mellanox
• Cloud Orchestrator– System Center 2012 R2
Note: this is not an exhaustive list
CONFIDENTIAL 14
STT (Stateless Transport Tunneling)
• L2 over L3 encapsulation proposed by VMware– draft-davie-stt-06.txt
• Why yet another L2 over L3 encapsulation ?– Performance
– Richer context information
– Multipath
– Software oriented
CONFIDENTIAL 15
TSO (TCP Segmentation Offload)
• Modern NIC (shipped within 4-5 years) equips various hardware acceleration features:– RSS, GSO/TSO, Checksum Offload, etc.
• With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in software)– Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the
number of packet processing (i.e. interrupt) hence much less context switches needed.
• To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”!
CONFIDENTIAL 16
Encapsulation / Segmentation in STT
STT (18)TCP’ (20)IP (20)
Payload 1STT (18)TCP’ (20)IP (20)
Payload 2TCP’ (20)IP (20)
Payload nTCP’ (20)IP (20)
L2 Frame (up to 64K)
・・・・
SegmentationBy
Hardware
CONFIDENTIAL 17
TCP-like Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields marked as * are repurposed in STT
CONFIDENTIAL 18
STT Header
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | L4 Offset | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Max. Segment Size | PCP |V| VLAN ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Context ID (64 bits) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Padding | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | |
CONFIDENTIAL 19
Throughput and CPU Utilization
Linux Bridge OVS Bridge OVS-GRE OVS-STT0
1
2
3
4
5
6
7
8
9
10
0
10
20
30
40
50
60
70
80
90
100
スループット CPU (Receive) CPU (Send)
(Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/
CONFIDENTIAL 20
Multipath in STT
STT (18)TCP’ (20)IP (20)
Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.)
dst port = 7471 (TBD)src port = Hash()
Ether IP TCP Data
Original Packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 21
Geneve (Generic Network Virtualization Encapsulation)
• New encapsulation being proposed by VMware, Microsoft, RedHat, Intel– draft-gross-geneve-00.txt
• Goals– Extensibility
• Service Chaining, Metadata support, etc.
– Leverage NIC offload
– Above two at the same time! (each one is straightforward, but two at the same time is difficult)
• Highlights– Information can be added as Option field in TLV formart
– Format carefully designed so that NIC can perform TSO
– OAM and Criticality (indicating parsing the option fields mandatory)
CONFIDENTIAL 22
Geneve Header & Option HeaderGeneve Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| Opt Len |O|C| Rsvd. | Protocol Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Length Options | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Class | Type |R|R|R| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Option Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 23
Geneve Implementation
• Recently implemented in Open vSwitch ( OVS ) and merged into master branch on GitHub
– VNI can be specified
– Geneve Options can’t be specified (at this point)
– Can’t mark OAM flag?? (I tried but didn’t work)
– Looks like Critical flag supported as long as critical options are present
• Geneve dissector for Wireshark also implemented and merged to master branch of Github
• Geneve-aware NIC is not available yet
CONFIDENTIAL 24
Running Geneve on Open vSwtich
host-1:~$ sudo ovs-vsctl add-br br0 host-1:~$ sudo ovs-vsctl add-br br1 host-1:~$ sudo ovs-vsctl add-port bra eth0 host-1:~$ sudo ifconfig eth0 0 host-1:~$ sudo dhclient br0 host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0 host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface \ geneve1 type=geneve options:remote_ip=192.168.203.149
host-2:~$ sudo ovs-vsctl add-br br0 host-2:~$ sudo ovs-vsctl add-br br1 host-2:~$ sudo ovs-vsctl add-port bra eth0 host-2:~$ sudo ifconfig eth0 0 host-2:~$ sudo dhclient br0 host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0 host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface \ geneve1 type=geneve options:remote_ip=192.168.203.151
CONFIDENTIAL 25
Dissecting Geneve Packets by Wireshark (1)
CONFIDENTIAL 26
Dissecting Geneve Packets by Wireshark (2)
CONFIDENTIAL 27
Information about Geneve
• English– http://tools.ietf.org/html/draft-gross-geneve-00
– http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
– http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation-protocol-advances-video.html
– http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf-war-with-GENEVE
– http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
• Japanese– http://blog.shin.do/2014/05/geneve-encapsulation/
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
CONFIDENTIAL 28
Geneve replaces VXLAN / STT / NVGRE ?
• Geneve replaces VXLAN ?– NO
– VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else
– VMware will continue to support VXLAN and ecosystem partners
• Geneve replaces STT?– In short term, NO. In the long run, maybe if
• Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT today.
• Geneve replaces NVGRE ?– In short term, NO. In the long run, maybe if
• Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today.
CONFIDENTIAL 29
Encapsulation is like a wire, right cable in the right place
http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
CONFIDENTIAL 30
World is not that simple • Some people are against Geneve
• Their claims are more or less as follows:
– What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static tunneling or VXLAN) as is or with a small extension !?
– Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be independent from encapsulation !?
– 24bit as VNI not long enough !?
CONFIDENTIAL 31
L2TPv3 static tunneling
• L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3 static tunneling” where configuration is made at both ends manually.
• L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS and Linux (l2tpd) have L2TPv3 static tunneling.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Session ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cookie (optional, maximum 64 bits)... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 32
L2TPv3 static tunneling as a L2 over L3 encapsulation
• Session ID (32bit) corresponds to VNI
• L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better.
• No explicit field for context information (metadata, etc.). It has to be configured manually on both ends (if possible) and express it implicitly as a part of Session ID– Therefore 32bit Session ID can’t be used entirely for VNI
• Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose. Many L2TPv3 implementations still have this “offset” option for backward compatibility to L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd support the offset field.
CONFIDENTIAL 33
VXLAN Generic Protocol Extension (a.k.a. eVXLAN)
• Proposed by Cisco 、 Huawei 、 Intel 、 Microsoft– draft-quinn-vxlan-gpe-03.txt
• An extension to VXLAN– Support protocols other than Ethernet
• IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04)– Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits.
– OAM support
– Version field
• Used by Cisco ACI
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 34
VXLAN-gpe as L2 over L3 encapsulation
• Mostly identical to VXLAN– VNI length (24bits)
– Multipath property
– Hardware friendliness
• The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network service header)
• No further extensibility
Thank You!
35