[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud Performance and...
-
Upload
openstack-korea-community -
Category
Technology
-
view
19.859 -
download
4
Transcript of [OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud Performance and...
Sr. SE 정연구
Feb.18. 2016
Mellanox CloudX– Enhanced Cloud Performance Acceleration
and Efficient Virtual Networking.
© 2015 Mellanox Technologies 2 - Mellanox Confidential -
Leading Supplier of End-to-End Interconnect Solutions
Software and Services ICs Switches/Gateways Adapter Cards Cables/Modules Metro / WAN
Store Analyze Enabling the Use of Data
At the Speeds of 10, 25, 40, 50, 56 and 100 Gigabit per Second
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
© 2015 Mellanox Technologies 3 - Mellanox Confidential -
The Future Depends on the Fastest Interconnects
10Gb/s 100Gb/s 1Gb/s
The Future Depends on Mellanox
© 2015 Mellanox Technologies 4 - Mellanox Confidential -
25Gb/s is the new 10, 50 is the new 40, and 100 is the Present
Flexibility, Opportunities, Speed
Open Ethernet, Zero Packet Loss
Most Cost-Effective Ethernet Adapter
Same Infrastructure, Same Connectors
One Switch. A World of Options. 25, 50, 100Gb/s at Your Fingertips
© 2015 Mellanox Technologies 5 - Mellanox Confidential -
Spectrum – 25GbE, 50GbE and 100GbE Open Ethernet Switch
Leading performance & power • World’s only non-blocking 100G, 6.4Tb/s switch
• Sub-300ns port-to-port latency
• RDMA over Converged Ethernet
• Lowest power (<135W)
Cloud-scale • Network virtualization at scale
• Bandwidth optimization
• Flexible SDN pipeline
Key Features • 32 ports of 40/100GbE
• 64 ports of 10/25/50GbE
• Advanced QoS and congestion control
• Dynamically shared buffer One Switch.
A World of Options.
Zero Packet Loss For All Packets Sizes
© 2015 Mellanox Technologies 6 - Mellanox Confidential -
ConnectX-4 Lx: Affordable 25GbE Performance
Affordable 25GbE / 50GbE Performance
• 2.5X the performance vs 10 GbE with same connectors / infrastructure
• 0.7us latency, 75 million messages per second
• The standard against which other Ethernet adapters will be compared
Optimized to deliver higher ROI
• Public and private Clouds
• Hyperscale Web 2.0 infrastructures
• Cost Effective Big Data and Analytics systems
Advanced features
• Advanced Network Virtualization & Overlay Networks Offloads
- VXLAN, NVGRE, GENEVE
• RDMA over Converged Ethernet (RoCE)
• Multi-Host Technology
• Fully featured embedded switch (eSwitch)
- Hardware acceleration of Open vSwitch
- VM traffic steering, monitoring enforcement
© 2015 Mellanox Technologies 7 - Mellanox Confidential -
Converged Infrastructure Relies on Efficient Data Movement
Efficient Data Movement
• Multi-Host & eSwitch: Embedded hardware OVS switch – Advance Flow Steering Engine
• Virtual network acceleration (VXLAN, NVGRE, GENEVE)
• RDMA – Efficient Data Exchange - Low Latency, Low CPU Overhead
Efficient Data Movement
With RDMA Virtual Overlay Network
Acceleration
Embedded Switch
Hardware OVS Switch
NVGRE, VXLAN, GENEVE VM
CPU CPU CPU CPU
VM VM VM VM VM
© 2015 Mellanox Technologies 8 - Mellanox Confidential -
CloudX is a group of reference architectures which allow
building the most efficient, high performance and scalable
Infrastructure As A Service (IaaS) clouds based on
Mellanox superior interconnect and off the shelf building
blocks
Supports the most popular cloud software
• Vmware
• OpenStack
• Windows Azure Pack (WAP)
CloudX: Optimized Cloud Platform
© 2015 Mellanox Technologies 10 - Mellanox Confidential -
Based on Mellanox ConnectX-4 NIC family and Switch-IB/Spectrum switches
Bring the astonishing 100Gbps speeds to OpenStack
• Both VMs and Hypervisors
• Accelerations are critical to reach line rate
- RDMA, Overlay, etc.
100Gbps Cloud!
© 2015 Mellanox Technologies 11 - Mellanox Confidential -
Server
VM1 VM2 VM3 VM4
Overlay Networks (VXLAN/NVGRE/GENEVE) Acceleration
Overlay Network Virtualization: Isolation, Simplicity, Scalability
Virtual Domain 3
Virtual Domain 2
Virtual Domain 1
Physical
View
Server
VM5 VM6 VM7 VM8
Mellanox SDN Switches & Routers
Virtual
View
NVGRE/VXLAN Overlay Networks Virtual Overlay Networks Simplifies
Management and VM Migration
ConnectX-3 Pro
Overlay Accelerators Enable
Bare Metal Performance
© 2015 Mellanox Technologies 12 - Mellanox Confidential -
Advantages of Overlay Networks
• Simplification
• Automation
• Scalable
Problem: Performance Impact!!
• Overlay tunnels add network processing
- Limits bandwidth
- Consumes CPU
Solution:
• Overlay Network Accelerators in NIC
• Penalty free overlays at bare-metal speed
• HW encap/decap (future)
Turbocharge Overlay Networks with ConnectX-3/4 NICs
© 2015 Mellanox Technologies 13 - Mellanox Confidential -
PCIe device presents multiple instances to the OS/Hypervisor
Enables Application Direct Access
• Reduces CPU overhead and improves application performance
Enable RDMA to the VM
• Low latency applications benefit from the Virtual infrastructure
Single Root I/O Virtualization (SR-IOV)
0
0.5
1
1.5
2
2.5
3
1 VM 2 VM 4 VM 8 VM
La
ten
cy (
us
)
RoCE - SR-IOV Latency
Message Size 2B Message Size 16B Message Size 32B
10
15
20
25
30
35
40
1 VM 2 VM 4 VM 8 VM 16 VM
Th
rou
gh
pu
t (G
b/s
)
RoCE – SR-IOV Throughput
Throughput (Gb/S)
© 2015 Mellanox Technologies 14 - Mellanox Confidential -
Value proposition: Remote Direct Memory Access (RDMA)
ZERO Copy Remote Data Transfer
Low Latency, High Performance Data Transfers
InfiniBand - 56Gb/s RoCE* – 40/56Gb/s
Kernel Bypass Protocol Offload
* RDMA over Converged Ethernet
Application Application USER
KERNEL
HARDWARE
Buffer Buffer
© 2015 Mellanox Technologies 15 - Mellanox Confidential -
Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no
additional software is required, RDMA is already inbox and used by our OpenStack customers !
RDMA Provide Fastest OpenStack Storage Access
Hypervisor (KVM)
OS
VM
OS
VM
OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Using RDMA
to accelerate
iSCSI storage
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 64 128 256
Ban
dw
idth
[M
B/s
]
I/O Size [KB]
iSER 4 VMs Write
iSER 8 VMs Write
iSER 16 VMs Write
iSCSI Write 8 vms
iSCSI Write 16 VMs
PCIe Limit
6X
RDMA enable 6x More Bandwidth, 5x lower I/O latency, and lower CPU%
© 2015 Mellanox Technologies 16 - Mellanox Confidential -
CEPH and Networks
High performance networks enable maximum cluster availability
• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers
• Real-time requirements for heartbeat, replication, recovery and re-balancing
Cluster (“backend”) network performance dictates cluster’s performance and scalability
• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients
and the Ceph Storage Cluster” (Ceph Documentation)
© 2015 Mellanox Technologies 17 - Mellanox Confidential -
Accelio, High-Performance Reliable Messaging and RPC Library
Open source!
• https://github.com/accelio/accelio/ && www.accelio.org
Faster RDMA integration to application
Asynchronous
Maximize msg and CPU parallelism
Enable > 10GB/s from single node
Enable < 10usec latency under load
Accelerating Ceph with RDMA – Work in Progress
140
434
0
50
100
150
200
250
300
350
400
450
500
KIOPs
TH
OU
SA
ND
S O
F I
OP
S
TCP RDMA
TCP
140K IOPS
RDMA
434K IOPS
Ceph Read IOPS: TCP vs. RDMA
Roadmap
© 2015 Mellanox Technologies 18 - Mellanox Confidential -
Virtual Switch Offload
Virtual switches are used as the forwarding plane in
the hypervisor
Virtual switches implement extensive support for
SDN (e.g. enforce policies) and are widely used by
the industry
SRIOV technology allows direct connectivity to the
NIC, as such, it bypasses the virtual switch and the
policies it can enforce
Goal
Enable SRIOV data plane with OVS control plane
• In other words, enable support for most SDN controllers
with SRIOV data plane
Offload OVS flow handling (classification,
forwarding etc.) to Mellanox eSwitch
OS VM
OS VM
OS VM
OS VM
tap tap SR-IOV
to the
VM
Embedded
Switch
vSwitch
Roadmap
© 2015 Mellanox Technologies 19 - Mellanox Confidential -
Open vSwitch – In a Nutshell
Forwarding
• Flow-based forwarding
• Decision about how to process packet is made in user space
• First packet of new flow is directed to ovs-vswitchd, following
packets hit cached entry in kernel
OVS Overview • http://openvswitch.org/slides/OpenStack-131107.pdf
Roadmap
© 2015 Mellanox Technologies 20 - Mellanox Confidential -
Virtual Switch Offload - Solution
Solution
Use Open vSwitch to be the management interface and control-
plane for the Embedded Switch (eSwitch)
Motivation
• Enable an easy, friendly, well-known and community
accepted, management framework for eSwitch
• Leverage Open vSwitch control-plane and SDN capabilities to
control eSwitch forwarding-plane
HW forwarded Packets
eSwitch
Software
Hardware
OVS-eSwitch
Netdev
Representor
Netdev
Representor
Netdev
Representor
Netdev
Representor
eSwitch
PF (wire)
Host IP interface Host exception path (user-space)
VF VF VF
netdev netdev
Para-virt Para-virt
Hypervisor
Representor Ports
Roadmap
© 2015 Mellanox Technologies 21 - Mellanox Confidential -
Containers
Containers provide light weight and efficient virtualization
In host network use Linux C groups to segregate traffic
Mellanox will enable advance network services for containers
• RDMA – Enable RDMA for InfiniBand and RoCE configuration
• SR-IOV – provide stronger traffic segregation and HW QoS enforcement
Roadmap
© 2015 Mellanox Technologies 23 - Mellanox Confidential -
Replicate Neutron data structures to external entities via RESTful API
Mellanox NEO fabric management and provision tool provision the fabric accordingly
Enables the following
• VLAN provisioning on Mellanox switches
• Dynamic InfiniBand partition key (pkey) configuration
Support for bare metal servers provisioning (Roadmap)
Zero Touch Network Provisioning
Mellanox SDN
Assist ML2
Neutron
Core
Additional ML2s
Neutron Server Fabric
Provisioning
© 2015 Mellanox Technologies 24 - Mellanox Confidential -
Network-Aware Scheduling
OpenStack
Nova
Network-Aware
Scheduler
Scheduling
Request
Scheduling
Recommendation
Query
Network
Metrics
Network
Metrics
Network
Topology,
Status
Settings,
Monitoring
Ethernet Network
VM Spawning
Hello Mr. Cloud, I need a VM…
• With a high link speed (67GbE for example)
• With network HA (MLAG)
• With a minimum hop count from my other VMs
• On a hypervisor that supports RDMA (RoCE)
• …and that is not currently congested
Roadmap
© 2015 Mellanox Technologies 25 - Mellanox Confidential -
Ironic Ethernet and InfiniBand Support
Ironic is OpenStack bare metal provisioning
Useful for High Performance Compute (HPC) and Big Data
Ironic is lacking network provisioning
• Only flat network, no VLAN/PKEY segregation
Mellanox is working with the community to
• Add Neutron support for Ironic
• Provide zero touch VLAN switch provision
• Enable InfiniBand support for Ironic with Neutron
Roadmap
I’m “Pixie Boots” the mascot of the "Bear Metal" Provisioning, a.k.a Ironic
© 2015 Mellanox Technologies 26 - Mellanox Confidential -
SR-IOV High Availability
OpenStack SR-IOV implementation doesn’t
support SR-IOV HA
Mellanox enable transparent SR-IOV HA on a
single NIC
LAG will be implemented by Mellanox NIC so VM
will only see a single Virtual Function (VF)
Mode supported
• Active Active
• LACP
Mellanox will also work with the community to
implement non transparent LAG (2 VFs per VM) NIC
Host
Virtual Function
VM
VF driver
User
Kernel
Virtual Function
Port 1 Port 2
LAG
(*) – Beta. Available per request
Roadmap
© 2015 Mellanox Technologies 28 - Mellanox Confidential -
Database ROI comparison – CloudX vs. Standard cloud
Efficient Virtual Network
Up to 56Gb/s Host Throughput
KVM Hypervisor
Open-
iSCSI w
iSER
SR-IOV
Compute Servers
OpenStack Cinder
iSCSI / iSER Target
RDMA
Storage Servers
Oracle Database with
Swingbench generate
and measure OLTP
Performance
CloudX
Conventional Cloud Network
with 10Gb/s Host Throughput
KVM Hypervisor
iSCSI Open
vSwitch
Compute Servers
OpenStack Cinder
iSCSI Target
Storage Servers
Oracle Database with
Swingbench generate
and measure OLTP
Performance
Conventional Cloud
© 2015 Mellanox Technologies 29 - Mellanox Confidential -
Benchmark Results – CloudX Wins on Performance & Efficiency
2.2X OLTP Performance
Higher Cost Efficiency;
52% Lower Cost per Transaction
0
20
40
60
80
100
120
140
160
180
200
750GB DB
COST PER UNIT OF PERFORMANCE ($ / KTPM)
Conventional Cloud CloudX
0
10
20
30
40
50
60
70
80
90
100
750G DB
PERFORMANCE IN THOUSANDS OF TRANSACTIONS PER MINUTE (KTPM)
Conventional Cloud CloudX
© 2015 Mellanox Technologies 30 - Mellanox Confidential -
Comprehensive OpenStack Integration for Switch and Adapter
Integrated with Major
OpenStack
Distributions
In-Box
Neturon-ML2
support for
mixed
environment
(VXLAN, PV,
SRIOV)
Ethernet
Neutron :
Hardware
support for
security and
isolation
Accelerating
storage
access by up
to 5X
OpenStack Plugins Create Seamless Integration , Control, & Management