DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope...

15
x DPDK Based Networking Products Enhance and Expand Container Networking [email protected] Jingdong Digital Technology

Transcript of DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope...

Page 1: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

x

DPDK Based Networking Products Enhance and Expand Container Networking

[email protected] Digital Technology

Page 2: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

2

Kubernetes Overview

kube-apiserverkube-controller-

manager

pod1 pod2

kubelet kube-proxy

pod3 pod4 pod5

kubelet kube-proxy

pod6

kube-scheduler

master

node2node1

• Pod to Pod communication• Pod to Service communication

Page 3: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

3

Flannel Overview

VXLAN encapsulation

eth0192.168.0.1

vxlan: flannel.110.10.10.0/32

bridge: cni010.10.10.1/24

eth010.10.10.2/24

eth010.10.10.3/24

pod1 pod2

eth0192.168.0.2

vxlan: flannel.110.10.20.0/32

bridge: cni010.10.20.1/24

eth010.10.20.2/24

eth010.10.20.3/24

pod3 pod4

underlying network

OuterEthernet header

node1 node2

Outer IP headersrc: 192.168.0.1dst: 192.168.0.2

OuterUDP header

Vxlan headerInner

Ethernet header

Inner IP headersrc: 10.10.10.2dst: 10.10.20.3

Payload

Page 4: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

4

1、pods communicate with endpoints in k8s cluster, packets must be encapsulated2、pods communicate with endpoints out of k8s cluster, packets must be masqueraded

It will lead to extra overhead. Besides, it can’t meet some demands, e.g. pod wants to access white-list enabled application outside of k8s cluster

Our goals:• no encapsulation• no network address translation• pods can be reached from everywhere directly

pod1

node1

non kubernetes nodespod2

node2

Our Choice:• contiv with layer3 routing mode

Page 5: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

5

Contiv Overview

• OVS to forward pod packets • BGP to publish pod ip

10.10.0.1 nexthop 192.168.1.110.10.0.2 nexthop 192.168.1.110.10.0.3 nexthop 192.168.1.210.10.0.4 nexthop 192.168.1.2

ovs

eth010.10.0.2/24

pod2

eth010.10.0.1/24

pod1

vvport2vvport1

inb01192.168.1.1

layer3 witch

ovs

eth010.10.0.4/24

pod4

eth010.10.0.3/24

pod3

vvport2vvport1

eth0

netplugin netplugin

inb01192.168.1.2

eth0

bgp bgp

Page 6: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

6

Contiv Implementation Detail

1、user creates a new pod in k8s cluster2、netplugin requests a free ip 10.10.0.1 from netmaster3、netplugin creates a veth pair, such as vport1 and vvport14、netplugin moves interface vport1 to pod network

namespace and rename it to eth05、netplugin sets ip and route in the pod network namespace6、netplugin adds vvport1 to ovs7、netplugin publishes 10.10.0.1/32 to bgp neighbor switch

• nw_dst=10.10.0.1 output:vvport1• nw_dst=10.10.0.2 output:vvport2

ovs

eth010.10.0.2/24

pod2

eth010.10.0.1/24

pod1

vvport2vvport1

inb01192.168.1.1

eth0

netpluginbgp

layer3 switch

Page 7: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

7

Pod IP is Reachable in IDC Scope

10.10.0.2(in cluster) ping 172.16.0.1(outside cluster)

1、pod2 sends out packet through its eth0

Ethernet headersrc: 10.10.0.2

dst: 172.16.0.1Payload

2、ovs receives packet from vvport2 and forwards it to host eth0

Ethernet headersrc: 10.10.0.2

dst: 172.16.0.1Payload

3、switch receives packet and forwards it to host 172.16.0.1

Ethernet headersrc: 10.10.0.2

dst: 172.16.0.1Payload

in the pod, in the host, in the underlying infrastructure,packet ip header is always the same

ovs

eth010.10.0.2/24

pod2

eth010.10.0.1/24

pod1

vvport2vvport1

inb01192.168.1.1

eth0

netpluginbgp

layer3 switch

machine outside of k8s cluster

172.16.0.1

Page 8: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

8

Contiv Optimization

1、multiple bgp neighbors support

2、reduce number of node’s ovs rules from magnitude of cluster to node

3、remove dns and load balance module from netplugin

4、add non-docker container runtime support, e.g. containerd

5、add ipv6 support

Page 9: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

9

Load Balance: Native KubeProxycontrol flowdata flow

eth010.10.0.1/24

pod1

vvport1

kube-apiserver

kube-proxy

eth0.200

layer3 switch

iptables

inb01

servicesendpoints

eth0

eth0.100

service traffic

eth010.10.0.2/24

pod2

vvport1

kube-proxy

vvport2

iptables

eth0.200

eth0

eth0.100

servicesendpoints

eth010.10.0.3/24

pod3

Page 10: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

10

Load Balance: DPDK-SLB

control flowdata flow

SLB Cluster

eth010.10.0.1/24

pod1

vvport1

kube-apiserver

eth0.200

layer3 switch

inb01

servicesendpoints

eth0

service traffic

eth010.10.0.2/24

pod2

vvport1 vvport2eth0.200

eth0

eth010.10.0.3/24

pod3

slb-controller

kube-proxy kube-proxy

DPDK SLB Cluster

eth0.100 eth0.100

• Kube-Proxy on all nodes not needed

• SLB-Controller watches services and endpoints in K8S, dynamically sends VS and RS info to DPDK-SLB

Page 11: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

11

DPDK-SLB: Control Plane

• SLB-Daemon: core process which does load balance and full NAT

• SLB-Agent monitors and configures SLB-Daemon

• OSPFD publishes service subnets to layer3 switch

admin kniworker_1

slb-daemon

slb-agent

(3)(3)

(3)

config

worker_2

config

worker_n

config

ospfd

slb-controller

• Admin core configures VS and RS info to worker cores

• KNI core forwards OSPF packets to kernel, the kernel then sends them to OSPFD

• Worker cores do the load balance

All data (config data, session data, local addrs) is per CPU, fully parallelizing packets processing

Page 12: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

12

DPDK-SLB: OSPF Neighbor

• OSPF uses multicast address 224.0.0.5

• Flow Director: destination ip 224.0.0.5 bound to queue_x

• Dedicated KNI core to process OSPF packets

• OSPFD publishes service subnets to layer3 switch

admin kniworker_1

slb-daemon

slb-agent

config

worker_2

config

worker_n

config

ospfd

eth0

queue_1 queue_2 queue_n queue_x

(2)

layer3 switch

(1)

Page 13: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

13

DPDK-SLB: Data Plane

1、{client_ip, client_port, vip,vport}2、rss selects a queue according to 5 tuple3、worker_1 does fullnat {local_ip1, local_port, server_ip, server_port}4、worker_1 saves session {cip,cport,vip,vport,lip1,lport,sip,sport}

nic

queue_1

queue_2

queue_n

worker_1

worker_2

worker_n

rss nicclient server

nic

queue_1

queue_2

queue_n

worker_1

worker_2

worker_n

fdir nicserver client

1、{server_ip, server_port, local_ip1, local_port}2、fdir selects a queue according to destination ip addr(local_ip1 bound to queue_1)3、worker_1 lookups session {cip,cport,vip,vport,lip1,lport,sip,sport}4、worker_1 does fullnat {vip, vport, client_ip, client_port}

the key point is that server-to-client packet must be placed on queue1, because only worker_1 has the session

Page 14: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

14

Make Apps Run in the Container Cloud Seamlessly

• layer3 switch routes:10.10.0.1 nexthop node110.10.0.4 nexthop node2service subnets nexthop dpdk-slb

• Pod IP can be reachable from vm1 outside k8s cluster

• Service IP can be reachable from vm2 outside k8s cluster

• Help apps to run in the container cloud and traditional environment at the same time

layer3 switch

DPDK-SLB

DPDK-SLB

ovs

eth010.10.0.2

pod2

eth010.10.0.1

pod1

vvport2

vvport1

ovs

eth010.10.0.4

pod4

eth010.10.0.3

pod3

vvport2

vvport1

node2

node1

vm 2

vm 1

Page 15: DPDK Based Networking Products Enhance and Expand ... · 7 Pod IP is Reachable in IDC Scope 10.10.0.2(in cluster) ping 172.16.0.1(outside cluster) 1、pod2 sends out packet through

Thank You!

Q & A

[email protected] Digital Technology