[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud Performance and...

31
Sr. SE 정연구 Feb.18. 2016 Mellanox CloudXEnhanced Cloud Performance Acceleration and Efficient Virtual Networking.

Transcript of [OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud Performance and...

Sr. SE 정연구

Feb.18. 2016

Mellanox CloudX– Enhanced Cloud Performance Acceleration

and Efficient Virtual Networking.

© 2015 Mellanox Technologies 2 - Mellanox Confidential -

Leading Supplier of End-to-End Interconnect Solutions

Software and Services ICs Switches/Gateways Adapter Cards Cables/Modules Metro / WAN

Store Analyze Enabling the Use of Data

At the Speeds of 10, 25, 40, 50, 56 and 100 Gigabit per Second

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

© 2015 Mellanox Technologies 4 - Mellanox Confidential -

25Gb/s is the new 10, 50 is the new 40, and 100 is the Present

Flexibility, Opportunities, Speed

Open Ethernet, Zero Packet Loss

Most Cost-Effective Ethernet Adapter

Same Infrastructure, Same Connectors

One Switch. A World of Options. 25, 50, 100Gb/s at Your Fingertips

© 2015 Mellanox Technologies 5 - Mellanox Confidential -

Spectrum – 25GbE, 50GbE and 100GbE Open Ethernet Switch

Leading performance & power • World’s only non-blocking 100G, 6.4Tb/s switch

• Sub-300ns port-to-port latency

• RDMA over Converged Ethernet

• Lowest power (<135W)

Cloud-scale • Network virtualization at scale

• Bandwidth optimization

• Flexible SDN pipeline

Key Features • 32 ports of 40/100GbE

• 64 ports of 10/25/50GbE

• Advanced QoS and congestion control

• Dynamically shared buffer One Switch.

A World of Options.

Zero Packet Loss For All Packets Sizes

© 2015 Mellanox Technologies 6 - Mellanox Confidential -

ConnectX-4 Lx: Affordable 25GbE Performance

Affordable 25GbE / 50GbE Performance

• 2.5X the performance vs 10 GbE with same connectors / infrastructure

• 0.7us latency, 75 million messages per second

• The standard against which other Ethernet adapters will be compared

Optimized to deliver higher ROI

• Public and private Clouds

• Hyperscale Web 2.0 infrastructures

• Cost Effective Big Data and Analytics systems

Advanced features

• Advanced Network Virtualization & Overlay Networks Offloads

- VXLAN, NVGRE, GENEVE

• RDMA over Converged Ethernet (RoCE)

• Multi-Host Technology

• Fully featured embedded switch (eSwitch)

- Hardware acceleration of Open vSwitch

- VM traffic steering, monitoring enforcement

© 2015 Mellanox Technologies 7 - Mellanox Confidential -

Converged Infrastructure Relies on Efficient Data Movement

Efficient Data Movement

• Multi-Host & eSwitch: Embedded hardware OVS switch – Advance Flow Steering Engine

• Virtual network acceleration (VXLAN, NVGRE, GENEVE)

• RDMA – Efficient Data Exchange - Low Latency, Low CPU Overhead

Efficient Data Movement

With RDMA Virtual Overlay Network

Acceleration

Embedded Switch

Hardware OVS Switch

NVGRE, VXLAN, GENEVE VM

CPU CPU CPU CPU

VM VM VM VM VM

© 2015 Mellanox Technologies 8 - Mellanox Confidential -

CloudX is a group of reference architectures which allow

building the most efficient, high performance and scalable

Infrastructure As A Service (IaaS) clouds based on

Mellanox superior interconnect and off the shelf building

blocks

Supports the most popular cloud software

• Vmware

• OpenStack

• Windows Azure Pack (WAP)

CloudX: Optimized Cloud Platform

© 2015 Mellanox Technologies 9 - Mellanox Confidential -

Data Path Acceleration

CloudX

© 2015 Mellanox Technologies 10 - Mellanox Confidential -

Based on Mellanox ConnectX-4 NIC family and Switch-IB/Spectrum switches

Bring the astonishing 100Gbps speeds to OpenStack

• Both VMs and Hypervisors

• Accelerations are critical to reach line rate

- RDMA, Overlay, etc.

100Gbps Cloud!

© 2015 Mellanox Technologies 11 - Mellanox Confidential -

Server

VM1 VM2 VM3 VM4

Overlay Networks (VXLAN/NVGRE/GENEVE) Acceleration

Overlay Network Virtualization: Isolation, Simplicity, Scalability

Virtual Domain 3

Virtual Domain 2

Virtual Domain 1

Physical

View

Server

VM5 VM6 VM7 VM8

Mellanox SDN Switches & Routers

Virtual

View

NVGRE/VXLAN Overlay Networks Virtual Overlay Networks Simplifies

Management and VM Migration

ConnectX-3 Pro

Overlay Accelerators Enable

Bare Metal Performance

© 2015 Mellanox Technologies 12 - Mellanox Confidential -

Advantages of Overlay Networks

• Simplification

• Automation

• Scalable

Problem: Performance Impact!!

• Overlay tunnels add network processing

- Limits bandwidth

- Consumes CPU

Solution:

• Overlay Network Accelerators in NIC

• Penalty free overlays at bare-metal speed

• HW encap/decap (future)

Turbocharge Overlay Networks with ConnectX-3/4 NICs

© 2015 Mellanox Technologies 13 - Mellanox Confidential -

PCIe device presents multiple instances to the OS/Hypervisor

Enables Application Direct Access

• Reduces CPU overhead and improves application performance

Enable RDMA to the VM

• Low latency applications benefit from the Virtual infrastructure

Single Root I/O Virtualization (SR-IOV)

0

0.5

1

1.5

2

2.5

3

1 VM 2 VM 4 VM 8 VM

La

ten

cy (

us

)

RoCE - SR-IOV Latency

Message Size 2B Message Size 16B Message Size 32B

10

15

20

25

30

35

40

1 VM 2 VM 4 VM 8 VM 16 VM

Th

rou

gh

pu

t (G

b/s

)

RoCE – SR-IOV Throughput

Throughput (Gb/S)

© 2015 Mellanox Technologies 14 - Mellanox Confidential -

Value proposition: Remote Direct Memory Access (RDMA)

ZERO Copy Remote Data Transfer

Low Latency, High Performance Data Transfers

InfiniBand - 56Gb/s RoCE* – 40/56Gb/s

Kernel Bypass Protocol Offload

* RDMA over Converged Ethernet

Application Application USER

KERNEL

HARDWARE

Buffer Buffer

© 2015 Mellanox Technologies 15 - Mellanox Confidential -

Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no

additional software is required, RDMA is already inbox and used by our OpenStack customers !

RDMA Provide Fastest OpenStack Storage Access

Hypervisor (KVM)

OS

VM

OS

VM

OS

VM

Adapter

Open-iSCSI w iSER

Compute Servers

Switching Fabric

iSCSI/iSER Target (tgt)

Adapter Local Disks

RDMA Cache

Storage Servers

OpenStack (Cinder)

Using RDMA

to accelerate

iSCSI storage

0

1000

2000

3000

4000

5000

6000

7000

1 2 4 8 16 32 64 128 256

Ban

dw

idth

[M

B/s

]

I/O Size [KB]

iSER 4 VMs Write

iSER 8 VMs Write

iSER 16 VMs Write

iSCSI Write 8 vms

iSCSI Write 16 VMs

PCIe Limit

6X

RDMA enable 6x More Bandwidth, 5x lower I/O latency, and lower CPU%

© 2015 Mellanox Technologies 16 - Mellanox Confidential -

CEPH and Networks

High performance networks enable maximum cluster availability

• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers

• Real-time requirements for heartbeat, replication, recovery and re-balancing

Cluster (“backend”) network performance dictates cluster’s performance and scalability

• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients

and the Ceph Storage Cluster” (Ceph Documentation)

© 2015 Mellanox Technologies 17 - Mellanox Confidential -

Accelio, High-Performance Reliable Messaging and RPC Library

Open source!

• https://github.com/accelio/accelio/ && www.accelio.org

Faster RDMA integration to application

Asynchronous

Maximize msg and CPU parallelism

Enable > 10GB/s from single node

Enable < 10usec latency under load

Accelerating Ceph with RDMA – Work in Progress

140

434

0

50

100

150

200

250

300

350

400

450

500

KIOPs

TH

OU

SA

ND

S O

F I

OP

S

TCP RDMA

TCP

140K IOPS

RDMA

434K IOPS

Ceph Read IOPS: TCP vs. RDMA

Roadmap

© 2015 Mellanox Technologies 18 - Mellanox Confidential -

Virtual Switch Offload

Virtual switches are used as the forwarding plane in

the hypervisor

Virtual switches implement extensive support for

SDN (e.g. enforce policies) and are widely used by

the industry

SRIOV technology allows direct connectivity to the

NIC, as such, it bypasses the virtual switch and the

policies it can enforce

Goal

Enable SRIOV data plane with OVS control plane

• In other words, enable support for most SDN controllers

with SRIOV data plane

Offload OVS flow handling (classification,

forwarding etc.) to Mellanox eSwitch

OS VM

OS VM

OS VM

OS VM

tap tap SR-IOV

to the

VM

Embedded

Switch

vSwitch

Roadmap

© 2015 Mellanox Technologies 19 - Mellanox Confidential -

Open vSwitch – In a Nutshell

Forwarding

• Flow-based forwarding

• Decision about how to process packet is made in user space

• First packet of new flow is directed to ovs-vswitchd, following

packets hit cached entry in kernel

OVS Overview • http://openvswitch.org/slides/OpenStack-131107.pdf

Roadmap

© 2015 Mellanox Technologies 20 - Mellanox Confidential -

Virtual Switch Offload - Solution

Solution

Use Open vSwitch to be the management interface and control-

plane for the Embedded Switch (eSwitch)

Motivation

• Enable an easy, friendly, well-known and community

accepted, management framework for eSwitch

• Leverage Open vSwitch control-plane and SDN capabilities to

control eSwitch forwarding-plane

HW forwarded Packets

eSwitch

Software

Hardware

OVS-eSwitch

Netdev

Representor

Netdev

Representor

Netdev

Representor

Netdev

Representor

eSwitch

PF (wire)

Host IP interface Host exception path (user-space)

VF VF VF

netdev netdev

Para-virt Para-virt

Hypervisor

Representor Ports

Roadmap

© 2015 Mellanox Technologies 21 - Mellanox Confidential -

Containers

Containers provide light weight and efficient virtualization

In host network use Linux C groups to segregate traffic

Mellanox will enable advance network services for containers

• RDMA – Enable RDMA for InfiniBand and RoCE configuration

• SR-IOV – provide stronger traffic segregation and HW QoS enforcement

Roadmap

© 2015 Mellanox Technologies 22 - Mellanox Confidential -

Control Plan Enhancements

CloudX

© 2015 Mellanox Technologies 23 - Mellanox Confidential -

Replicate Neutron data structures to external entities via RESTful API

Mellanox NEO fabric management and provision tool provision the fabric accordingly

Enables the following

• VLAN provisioning on Mellanox switches

• Dynamic InfiniBand partition key (pkey) configuration

Support for bare metal servers provisioning (Roadmap)

Zero Touch Network Provisioning

Mellanox SDN

Assist ML2

Neutron

Core

Additional ML2s

Neutron Server Fabric

Provisioning

© 2015 Mellanox Technologies 24 - Mellanox Confidential -

Network-Aware Scheduling

OpenStack

Nova

Network-Aware

Scheduler

Scheduling

Request

Scheduling

Recommendation

Query

Network

Metrics

Network

Metrics

Network

Topology,

Status

Settings,

Monitoring

Ethernet Network

VM Spawning

Hello Mr. Cloud, I need a VM…

• With a high link speed (67GbE for example)

• With network HA (MLAG)

• With a minimum hop count from my other VMs

• On a hypervisor that supports RDMA (RoCE)

• …and that is not currently congested

Roadmap

© 2015 Mellanox Technologies 25 - Mellanox Confidential -

Ironic Ethernet and InfiniBand Support

Ironic is OpenStack bare metal provisioning

Useful for High Performance Compute (HPC) and Big Data

Ironic is lacking network provisioning

• Only flat network, no VLAN/PKEY segregation

Mellanox is working with the community to

• Add Neutron support for Ironic

• Provide zero touch VLAN switch provision

• Enable InfiniBand support for Ironic with Neutron

Roadmap

I’m “Pixie Boots” the mascot of the "Bear Metal" Provisioning, a.k.a Ironic

© 2015 Mellanox Technologies 26 - Mellanox Confidential -

SR-IOV High Availability

OpenStack SR-IOV implementation doesn’t

support SR-IOV HA

Mellanox enable transparent SR-IOV HA on a

single NIC

LAG will be implemented by Mellanox NIC so VM

will only see a single Virtual Function (VF)

Mode supported

• Active Active

• LACP

Mellanox will also work with the community to

implement non transparent LAG (2 VFs per VM) NIC

Host

Virtual Function

VM

VF driver

User

Kernel

Virtual Function

Port 1 Port 2

LAG

(*) – Beta. Available per request

Roadmap

© 2015 Mellanox Technologies 27 - Mellanox Confidential -

Summary & Case

© 2015 Mellanox Technologies 28 - Mellanox Confidential -

Database ROI comparison – CloudX vs. Standard cloud

Efficient Virtual Network

Up to 56Gb/s Host Throughput

KVM Hypervisor

Open-

iSCSI w

iSER

SR-IOV

Compute Servers

OpenStack Cinder

iSCSI / iSER Target

RDMA

Storage Servers

Oracle Database with

Swingbench generate

and measure OLTP

Performance

CloudX

Conventional Cloud Network

with 10Gb/s Host Throughput

KVM Hypervisor

iSCSI Open

vSwitch

Compute Servers

OpenStack Cinder

iSCSI Target

Storage Servers

Oracle Database with

Swingbench generate

and measure OLTP

Performance

Conventional Cloud

© 2015 Mellanox Technologies 29 - Mellanox Confidential -

Benchmark Results – CloudX Wins on Performance & Efficiency

2.2X OLTP Performance

Higher Cost Efficiency;

52% Lower Cost per Transaction

0

20

40

60

80

100

120

140

160

180

200

750GB DB

COST PER UNIT OF PERFORMANCE ($ / KTPM)

Conventional Cloud CloudX

0

10

20

30

40

50

60

70

80

90

100

750G DB

PERFORMANCE IN THOUSANDS OF TRANSACTIONS PER MINUTE (KTPM)

Conventional Cloud CloudX

© 2015 Mellanox Technologies 30 - Mellanox Confidential -

Comprehensive OpenStack Integration for Switch and Adapter

Integrated with Major

OpenStack

Distributions

In-Box

Neturon-ML2

support for

mixed

environment

(VXLAN, PV,

SRIOV)

Ethernet

Neutron :

Hardware

support for

security and

isolation

Accelerating

storage

access by up

to 5X

OpenStack Plugins Create Seamless Integration , Control, & Management

© 2015 Mellanox Technologies 31 - Mellanox Confidential -

Thank You Thank You