IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory...

77
IBM Shared Memory Communications SMC for z/OS and Linux on Z Jerry Stevens ([email protected]) IBM November 2019 Session EL

Transcript of IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory...

Page 1: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Shared Memory CommunicationsSMC for z/OS and Linux on Z

Jerry Stevens ([email protected])IBM

November 2019Session EL

Page 2: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Trademarks, notices, and disclaimers

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.BigInsightsBlueMixCICS*COGNOS*DB2*DFSMSdfp

IMSLanguage Environment*MQSeries*Parallel Sysplex*PartnerWorld*

DFSMSdssDFSMShsmDFSORTDS6000*DS8000*

* Registered trademarks of IBM Corporation

The following are trademarks or registered trademarks of other companies.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. andLinux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. OpenStack is a trademark of OpenStack LLC. The OpenStack trademark policy is available on the OpenStack website.TEALEAF is a registered trademark of Tealeaf, an IBM Company.Windows Server and the Windows logo are trademarks of the Microsoft group of countries.Worklight is a trademark or registered trademark of Worklight, an IBM Company.UNIX is a registered trademark of The Open Group in the United States and other countries.* Other product and service names might be trademarks of IBM or other companies.

FICON*GDPS*HyperSwapIBM*IBM (logo)*

RACF*Rational*Redbooks*REXXSmartCloud*

System z10*Tivoli*UrbanCodeWebSphere*z13z14

zEnterprise*z/OS*zSecurez Systemsz/VM*

Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT.

Page 3: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Agenda (z/OS & Linux with SMC)

1. z/OS (SMC Overview – big picture) topics: 1. Review; what is Shared Memory Communications (SMC); SMC-R and SMC-D2. SMC workload applicability / benefits 3. z/OS SMC-AT

2. SMC for Linux on Z topics:Acknowledgements and thanks to contributor Stefan Raspl (IBM) 1. Linux SMC Overview 2. Usage examples3. Support Status4. SMC-tools

3. DB2/SAP SMC-D Performance BenchmarksAcknowledgements and thanks to contributors: Veng Ly and Fimy Hu (IBM)

3

Page 4: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Review: Shared Memory Communications

• Shared Memory Communications is an innovative communications protocol that provides optimized communications allowing applications within separate Operating System instances to communicate directly through “shared memory”

• Applications rendezvous and connect using existing TCP/IP protocols• Memory is “shared” using:

• Remote platform: Industry based Remote Direct Memory Access technology (RoCE)• Same platform: IBM Z Systems Direct Memory Access architecture (ISM)

• Eligible TCP connections dynamically switch from TCP/IP to SMC• SMC leverages your existing application workloads, your TCP/IP (e.g. TLS) and

Ethernet assets while providing optimized network communications!• Shared Memory Communications is provided in the following two variations…

4

Page 5: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Review: SMC-R Shared Memory Communications over RDMA

5

SMC-R enabled platform

OS image OS image

Virtual server instance

server client

RNIC

RDMA technology provides the capability to allow hosts to logically share memory. The SMC-R protocol defines a means to exploit the shared memory for communications - transparent to the applications!

Shared Memory Communicationsvia RDMA

SMCSMC

RDMA enabled (RoCE)

RNIC

Clustered Systems

SMC-R is an open sockets over RDMA protocol that provides transparent exploitation of RDMA (for TCP based applications) while preserving key functions and qualities of service from the TCP/IP ecosystem that enterprise level servers/network depend on! IETF RFC for SMC-R:

http://www.rfc-editor.org/rfc/rfc7609.txt

SMC-R enabled platform

Virtual server instance

shared memory shared memory

Sockets Sockets

Page 6: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Review: SMC-DShared Memory Communications – Direct Memory Accesswith IBM Z System Internal Shared Memory (ISM) feature

6

IBM Z System: z13 and z14

OS image OS ImageLP 1 LP 2

Server

ISMVCHID

The Shared Memory Communications-Direct Memory Access (SMC-D) protocol can significantly optimize intra-CPC Operating Systems communications – transparent to socket applications!•Tightly couples socket API communications / memory within the CPC.

•Eliminates TCP/IP processing in the data path.

•ISM is a Z System firmware solution that leverages existing Operating System virtual memory PCI architecture without requiring any additional hardware.

System z vPCI FirmwareShared Memory Communications

vPCI ISM Virtual FunctionvPCI ISM Virtual Function

Sockets SMC SMC Sockets client

Shared Mem Shared Mem

FID 1 FID 2

Page 7: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SMC Requirements and Applicability SMC requires / applies to: 1. TCP/IP connections only (TCP)2. peer hosts (client/server) must have direct Ethernet layer 2 connectivity

(SMC does not support IP routing or IPSec). 3. SMC-R requires an RDMA capable NIC (RNIC) and a standard 10GbE Ethernet

Switch (RoCE). The IBM RoCE Express2 also supports 25GbE.

SMC is well suited for workloads that:1. use persistent TCP connections (vs. shorted lived connections)2. are latency sensitive (transactional workloads; can improve response time)3. exchange large amounts of data (CPU savings increase with payload size)

7

Page 8: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

z/OS SMC Applicability Tool (SMC-AT)

§ SMC-AT is a tool that will help customers determine the applicability of SMC-R or SMC-D to their workloads in their current environment with minimal effort and impact

§ Part of the TCP/IP stack: Gather new statistics that are used to project SMC applicability and potential benefits for the current system

§ Minimal system overhead, no changes in TCP/IP network flows; Produces summary reports on potential applicability / benefits of enabling SMC-R

§ Also available now on existing z/OS releases via maintenance–Does not require SMC-R or SMC-D to be enabled –Does not require RoCE Express Features or any specific System z processor–Can be used for determining potential benefits prior to moving to latest software and

hardware levels–SMC-AT can be used to evaluate z/OS workloads for any peer host (e.g. Linux)

8

Page 9: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Agenda (z/OS & Linux with SMC)

1. z/OS introduction2. SMC for Linux on Z topics:

1. Linux SMC Overview 2. Usage examples3. Support Status4. SMC-tools

3. DB2/SAP SMC-D Performance Benchmarks

9

Page 10: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1010IBM Z / © 2018 IBM Corporation 10

SMC Basics / Motivation

What if we had a networking technology that could provide

l low latencyl high throughput

and save CPU cycles at thesame time?

Page 11: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1111IBM Z / © 2018 IBM Corporation 11

SMC Basics / Motivation

What sendingdata throughBSD socketslooks like

Page 12: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1212IBM Z / © 2018 IBM Corporation 12

What if we hada simple bufferto write data toand let hardwaredo the rest...?

SMC Basics / Motivation

Page 13: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 13

The RDMA ApproachSMC Basics / Motivation

§ RDMA (Remote Direct Memory Access) based technology originating from Infiniband (IB)

§ Enables a host to read or write directly from/to a remote host’s memory with drastically reduced use of remote host’s CPU (interrupts required for notification only)

§ Native / direct application exploitation requires rewrite of network-related program logic, deep level of expertise in RDMA and a new programming model

§ Therefore, provide a transparent approach:

- SMC-R: Use RDMA over Converged Ethernet (RoCE) technology

l Unlike IB, RoCE does not require unique network components (host adapters, switches, security controls, etc.)

l Utilize existing Ethernet fabric with RDMA-capable NICs and switches

- SMC-D: Use DMA when both hosts are within a Z system via virtual PCI device

Page 14: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1414IBM Z / © 2018 IBM Corporation 1414

§ For each new TCP connection:

- Start out with a regular TCP/IP connection, advertising (R)DMA capabilities

- If peer confirms, negotiate details about the (R)DMA capabilities & connectivity

- Switch over to an (R)DMA device for actual traffic depending on the peers’ capabilities

- Regular TCP connection through NICs remains active but idle

OverviewSMC Basics / The SMC Protocol

14

Page 15: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1515IBM Z / © 2018 IBM Corporation 15

SMC Basics / The SMC Protocol

§ PNET ID: Physical network identifier

§ Customer-defined value to logically group NICs and RDMA adapters connected to the same physical network within a host.

§ Defined in - IOCDS for any of OSA, RoCE, HiperSockets or

ISM, or - using smc_pnet tool (SMC-R only, all of the

above and virtual networking facilities like z/VM vNICs et al)

§ Typically associate- OSA and RoCE cards, or- HiperSockets and ISM devices

§ Note: PNET IDs help to locate a suitable (R)DMA device for a given NIC within a host. The peer can use totally different PNET IDs (as long as the right devices are grouped)

PNET IDs

15

Page 16: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1616IBM Z / © 2018 IBM Corporation 1616

§ Start out with regular transport via e.g. OSA or HiperSockets

§ SMC capability indicated by TCP experimental option during TCP 3-way handshake

§ CLC handshake used to exchange info for SMC transport→adds additional round-trips

§ Fallback to regular TCP/IP in case of failure at any point during setup

§ In case of matching capabilities: Switch to (R)DMA device

§ No fallback to or usage of regular TCP/IP connection beyond this point!

§ See RFC 7609 “IBM's Shared Memory Communications over RDMA (SMC-R) Protocol”(https://tools.ietf.org/html/rfc7609) for a detailed description

Protocol DetailsSMC Basics / The SMC Protocol

16

Page 17: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1717IBM Z / © 2018 IBM Corporation 17

SMC Basics / The SMC Protocol

Why aHybridProtocol?

Leverages key existing attributes:§ Follows standard TCP/IP connection setup

§ Dynamically switches to (R)DMA (SMC)

§ Preserves critical operational and network management TCP/IP features such as:

- Minimal (or zero) IP topology changes

- Transparent to channel bonding, load balancers and VLANs

- Preserves existing IP security model (e.g. IP filters, policy, VLANs, SSL etc.)

- Minimal network admin / management changes

- Built-in failover capabilities for RDMA devices

17

Page 18: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1818IBM Z / © 2018 IBM Corporation 18

SMC Basics / Benefits

Less latencyLower CPU usage

Firmware /Hardware

Kernel space

User space

Page 19: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

1919IBM Z / © 2018 IBM Corporation 1919

SMC in Action / Deploying SMC

Workload Performance§ rr1c-200x1000 (highly

transactional)With SMC-D, the transaction times are reduced by 40% compared to 56k-HiperSockets (up to 70% reduction for high numbers of parallel connections).

§ rr1c-200x30k (large transactional)With SMC-D, the throughput increases by 2xfor single connection and up to 3.7x for high numbers of parallel connections.

§ str-writex30k (streaming)Throughput increases by 1.4x for single connection streaming workloads and up to 4.1x for high numbers of parallel connections.

Fig 1: rr1c-200x1000

Fig 2: rr1c-200x30k

Fig 3: str-writex30k 19

Page 20: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2020IBM Z / © 2018 IBM Corporation 20

SMC Basics / Benefits

Full BSDsockets API compatibility

1) Install kernel with SMC-D/R support

2) Install smc-tools (shipped with distro, or see https://ibm.biz/BdiZ5m)

3) In IOCDS:- Define (R)DMA devices- Assign PNET IDs to networking devices- Alternative: Use smc_pnet in Linux (SMC-R

only)

4) In applications’ socket() calls, replace AF_INETwith AF_SMC, i.e.:

int s, ipv6 = 0;

s = socket(AF_SMC, SOCK_STREAM, ipv6);

20

Page 21: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2121IBM Z / © 2018 IBM Corporation 21

SMC Basics / Benefits

21

Run yourapplicationsunmodified

§ SMC is transparent to existing applications –no changes required

§ Use smc_run, also provided by smc-tools:smc_run <my_application>

§ Or use preload library directly, provided by smc-tools, to enable existing applications:

export LD_PRELOAD=libsmc_preload.so

Note: Will not work with statically linked applications.

21

Page 22: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2222IBM Z / © 2018 IBM Corporation 22

SMC Basics / Benefits

22

PreserveExistingSecurity Model

§ The hybrid nature of SMC (beginning with regular TCP/IP, then switching to SMC) allows existing IP and TCP layer (i.e. IP and port-based) security features to automatically apply to SMC connections.

§ This includes:- SSL/TLS- IP Filters, Traffic regulation, Intrusion

detection systems- Auditing based on IP addresses and

ports

§ Not supported:- IPSec tunnels- Deep Packet Inspection

§ No changes from a user perspective required

22

Page 23: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2323IBM Z / © 2018 IBM Corporation 23

Mixing SMC Usage§ Both variants of SMC can be used

concurrently to provide an optimized solution

§ Enable SMC independent of peers’ capabilities; i.e. no commonality in SMC support on all peers required

§ Use- SMC-D for local connections- SMC-R for remote connections- fall-back to regular TCP where

neither SMC variant is supported

SMC Basics / Heterogeneous Environments

23

Page 24: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 24

Agenda§ SMC Basics

- Motivation- The SMC Protocol- Benefits

§ SMC for Linux on Z- SMC-D and SMC-R- smc-tools

§ SMC in Action- Usage Examples- Deploying SMC- Tips & Tricks

§ Outlook§ Miscellaneous

Page 25: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2525IBM Z / © 2018 IBM Corporation 2525

Supported Environments

§ Linux on Z Environments

- LPAR

yes

- z/VM guestsyes (z/VM 6.3 or later)

- KVM guestsin progress

- Dockerlimited support, to be

improved

§ Operating Systems:

- SMC-D

l Linux on Zl z/OS: IBM z/OS V2R2 (via APAR) or later

- SMC-Rl Linux on Z

l z/OS: IBM z/OS V2R1 (via APAR) or later

l AIX: System P with AIX 7.2, see https://ibm.biz/BdZutT

§ For latest updates seehttp://linux-on-z.blogspot.com/p/smc-for-linux-on-ibm-z.html

SMC in Action

25

Page 26: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2626IBM Z / © 2018 IBM Corporation 26

SMC in Action

26

SupportedLinuxDistributions

§ RHEL 8

§ SLES 12 SP4: Use Linux kernel level 4.12.14-95.13.1 or higher (available via maintweb update) for essential bug fixes

§ SLES 15 SP1

§ Ubuntu 18.10 or later

§ Note: All shipments include z/OS compatibility

26

Page 27: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

2727IBM Z / © 2018 IBM Corporation 27

§ Direct connectivity over same IP subnet.I.e. no routed traffic, no peers in differentIP subnets

§ (R)DMA device(s) attached and configured

§ PNET IDs assigned

§ TCP only, i.e. no UDP

§ No IPsec (SSL/TLS works)

§ No NAT (violates same subnet prerequisite)

§ (SMC-D only): Classic mode required (i.e. no DPM support)

SMC in Action

Prerequisites

27

Page 28: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 28

Usage Example: SMC-DSMC in Action / Usage Examples

Prerequisites: Applications in different LPARs on same CEC communicating through HiperSockets.ISM (FID: 80) and HiperSockets devices have the same PNET ID configured in IOCDS in each LPAR.# Hotplug ISM device if not yet visible via lspci command (see next step)# Alternative: Use echo 1 > /sys/bus/pci/slots/00000080/power if smc_rnics is not available$ smc_rnics -e 80

# Verify presence of ISM device# NOTE: No extra setup for VLAN usage required$ lspci0001:00:00.0 Non-VGA unclassified device: IBM Internal Shared Memory (ISM) virtual PCI device

# Run application using smc_run$ smc_run foo_socks

# Verify that SMC is really used$ smcss -aState UID Inode Local Address Foreign Address Intf ModeACTIVE 20000 115762 10.101.4.8:60594 10.101.4.49:3220 0000 SMCDACTIVE 20000 112844 10.101.4.8:60592 10.101.4.49:3220 0000 SMCDACTIVE 20000 112605 10.101.4.8:60590 10.101.4.49:3220 0000 SMCD

One

-tim

e Se

tup

Page 29: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 29

One

-tim

e Se

tup

Usage Example: SMC-RSMC in Action / Usage Examples

Prerequisites: Existing Applications in LPARs on separate CECs communicating through OSA card enccw0.0.f500.RoCE Express adapter has network interface ens2 and infiniband interface mlx4_0 - we will use its 1

stport. No PNET IDs

configured.# Verify presence of RoCE card$ lspci0000:00:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

# Set RoCE card interface UP, and verify$ ip link set ens2 up$ ip link show ens23: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

link/ether 82:03:14:32:f1:a0 brd ff:ff:ff:ff:ff:ff

# OPTIONAL (VLANs only): Define an interface, and assign an IP – interface does not need to be in state UP!$ ip link add dev ens2.201 link ens2 type vlan id 201$ ip addr add 192.168.23.42/24 dev ens2.201

# OPTIONAL (no PNET IDs in IOCDs): Configure PNET ID on OSA and RoCE device:$ smc_pnet -a PNET1 -I enccw0.0.f500 -D mlx4_0 -P 1$ smc_pnet -sPNET1 enccw0.0.f500 mlx4_0 1

# Run application using smc_run$ smc_run foo_socks

# Verify that SMC is really used$ smcss -aState UID Inode Local Address Foreign Address Intf ModeACTIVE 20000 115762 10.101.4.8:60594 10.101.4.49:3220 0000 SMCR

Page 30: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

3030IBM Z / © 2018 IBM Corporation 30

Deployment Considerations§ All supported Linux distributions are z/OS

compatible§ Verify that your workload is applicable to SMC

Specifically:- TCP only – e.g. Oracle RAC is known to

predominantly use UDP, hence benefit will be small

- Same IP subnet: Peer must be in same broadcast domain

§ Take into account that SMC-D/R might fall back to the regular TCP/IP interface

§ SMC-R: No link failover (yet)With support pending, HA must either be achieved in layers above networking, or be dispensable

SMC in Action / Deploying SMC

30

§ z/VM- PNET IDs: VSWITCH does not forward PNET ID from

attached OSA to vNICs⇒ Use smc_pnet to configure PNET IDs

for SMC-R manually- LGR: Live Guest Relocation is not yet supported- Recommendation: Utilize passthrough devices

(OSA/HiperSockets/RoCE/ISM) for SMC-D/R usage without any restrictions

§ KVM- PNET IDs: virtio-net devices do not forward PNET IDs

⇒ Use smc_pnet to configure PNET IDs for SMC-R manually

- PCI Passthrough: RoCE Express only, no ISM devices- LGM: Live Guest Migration is not yet supported- Recommendation: Utilize passthrough RoCE Express

cards for SMC-R

§ Docker- Default setup violates SMC’s same-subnet

prerequisite- Provide container with direct access to host’s IP

interface by using option --network host

Page 31: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 31

smc-tools Package Overview

§ Current version: v1.2.2 (download from https://ibm.biz/BdzTZ4)

§ smc-tools provides the following commands:- smc_pnet

l Associate NICs via PNET ID in softwarel Does not modify/create IOCDS entriesl Also works with bonding and VLAN devicesl Note: PNET IDs defined in IOCDS always

override- smc_run: Enable a binary application to use SMC.- smcss: Information about SMC-enabled sockets and

link groups. Includes information on SMC mode used, as well as TCP fallbacks

- smc_rnics (v1.2 or later)l List all available PCI (R)DMA devices, including

PNET IDl Hotplug / hotunplug PCI (R)DMA devices

- smc_dbg (v1.2 or later)l Collect debugging informationl As a side effect, lists PNET IDs for CCW

devices

SMC for Linux on Z

# List all (R)NICs$ smc_rnics FID Power PCI ID PCHID Type Port PNET ID Interface---------------------------------------------------------------------------

1 1 0000:00:00.0 0144 RoCE Express 0 76 ens11 1 0000:00:00.0 0144 RoCE Express 1 ens1d1

80 1 0001:00:00.0 07c0 ISM n/a NETSR81 0

# Hotplug FID 81$ smc_rnics -e 81

# Note: FID 81 is now available$ smc_rnics FID Power PCI ID PCHID Type Port PNET ID Interface---------------------------------------------------------------------------

1 1 0000:00:00.0 0144 RoCE Express 0 76 ens11 1 0000:00:00.0 0144 RoCE Express 1 ens1d1

80 1 0001:00:00.0 07c0 ISM n/a NETSR81 1 0001:00:00.0 07c0 ISM n/a NETSR

Page 32: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

32

SAP on Z Performance TeamPoughkeepsie, New York

Performance Evaluation of SMC-D with SAP Banking on IBM Z

Veng LyFimy Hu

Page 33: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Please note• IBM’s statements regarding its plans, directions, and intent are subject to

change or withdrawal without notice and at IBM’s sole discretion.• Information regarding potential future products is intended to outline our

general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

• Performance is based on measurements and projections in a controlled environment. These measurements are stress tests, not certified benchmarks. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

33

Page 34: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SMCAT: Eligibility and Suitability Checkfor SAP Banking Environment

34

34

SMC Applicability Interval Report - 05/12/2019, 19:45:35.00SMC Applicability Configuration Parameters - 05/12/2019, 09:46:00.87Interval: 180 minutes IP addresses/subnets being monitored 10.32.40.30/24 100.0.0.0/32

End of configuration parameters

Configured Interval Duration: 180 minutes Actual Interval Duration: 180 minutes

TCP SMC traffic analysis for all matching connections-----------------------------------------------------Includes connections not meeting direct connectivity requirements

100% of all TCP connections can use SMC (eligible) 100% of eligible connections are well-suited for SMC 100% of all TCP traffic (segments) is well-suited for SMC100% of outbound traffic (segments) is well-suited for SMC 100% of inbound traffic (segments) is well-suited for SMC

Interval Details:Total TCP Connections: 234 Total SMC eligible connections: 234

Total SMC well-suited connections: 234 Total outbound traffic (in segments) 856197001

SMC well-suited outbound traffic (in segments) 856197001 Total inbound traffic (in segments) 935804924

SMC well-suited inbound traffic (in segments) 935804373

• 100% eligibility of TCP connections

• 100% suitability of the workload’s TCP traffic

• The tool confirms the expected suitability of SMC for our batch workload

• All application and database servers use this subnet

Page 35: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Introduction to Performance Study• Recent generations of IBM Z are excellent platforms for server consolidation and co-location

• Increasingly more processor cores allows hosting of very large workloads such as SAP in a single Central Electronics Complex(CEC)

• Employing IBM internal cross-memory communication features can improve performance ofco-located application servers in the same Z CEC

• Shared Memory Communication (SMC) offers optimized mainframe intra-system communication which many application workloads on IBM Z can benefit from

• Study Objective: Quantify the performance benefits of SMC:

• Evaluate performance benefits of using SMC communications with the customer-like SAP Banking Account Settlement workload

35

Page 36: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SMC Basic Concept & Potential Benefits• The diagram below illustrates the differences between sending data through SMC sockets and TCP/IP sockets

36

• SMC eliminates several processing layers of the TCP/IP data path

• may lower CPU utilization and network latency

Page 37: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Test Methodology

• Study Focus: Evaluation of performance differences between IBM Z internal cross-memory communication features

• HiperSockets vs. SMC-D

• Software and hardware configurations are held constant through all test scenarios with the exception of the network communication feature under test

• The following measurements are stress tests, not certified benchmarks

37

Page 38: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Test Environment

38

HardwareTypical Logical SAP Environment

(co-located)

Both database and application servers are co-located within the same z14

Internal LPAR to LPAR communications

Page 39: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SAP Core Banking: Account Settlement Workload

• Part of the standard SAP Core Banking application

• Workload simulates typical mass account balancing that takes place during nightly processing of a retail bank• Tasks include determining account balances, calculating interests and charges, etc

• Account settlement processing must be completed within a critical batch window to follow banking regulations and operational requirements

• Heavy requirement (focus) on application server processing

• Test workload characteristics:• SAP SBS 7.0• 60 million accounts banking database

39

Page 40: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Key Performance Indicators(KPI)

40

External Throughput Rate(ETR)

Transaction rate of processing

60 million bank accounts over the total

batch elapsed time

Internal Throughput Rate(ITR)

ETR normalized to 100% processor utilization

Serves as a reliable basis for measuring processor capacity

Total Batch Elapsed Time

Time required to complete batch processing of all

accounts

CPU Utilization of Database

and Application Servers

Percentage of processor utilization for the z/OS

and Linux servers

Page 41: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Measurement Results: CPU Utilization

• Significant savings on CPU utilization for the database server while maintain batch elapsed time

• Reduction in CPU utilization may result in significant cost savings on z/OS

41

77% 77%84%

67%74%

84%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

DB Server z/OS Utilization (CP) DB Server z/OS Utilization (zIIP) App Server Linux Utilization (IFL)

CPU Utilization of Database and Application ServersSMC-D vs HiperSockets

HiperSockets SMC-D

1.00 1.00

0.00

0.20

0.40

0.60

0.80

1.00

1.20

HiperSockets SMC-D

Batch Elapsed Time (ratio)SMC-D vs HiperSockets

Page 42: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Measurement Results: ETR and ITR

• Switching from HiperSockets to SMC-D minimizes TCP/IP processing results in:• Improved CPU efficiency• More processor capacity as shown by the 15% higher ITR• May potentially lower the associated software license costs for the SAP database server on IBM Z

• External Throughput Rate(ETR): Transaction rate of processing 60 million bank accounts over the total batch elapsed time

• Internal Throughput Rate(ITR): calculated as the ETR normalized over the measured CPU utilization

42

1.001.15

0.000.200.400.600.801.001.201.40

HiperSockets SMC-D

Database Server ITR (ratio)SMC-D vs HiperSockets

1.00 1.00

0.00

0.20

0.40

0.60

0.80

1.00

1.20

HiperSockets SMC-D

External Throughput Rate (ratio)SMC-D vs HiperSockets

Page 43: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Summary of Measurement Findings

43

SMC-D is recommended for internal network communications in a co-located IBM Z environment for multiple reasons:

• The stress tests conducted in this study demonstrate the maturity and stability of the SMC-D technology

• Switching from HiperSockets to SMC-D does not require changes to existing application software

• SMC may lower CPU utilization and improve network latency

• 15% higher ITR for the SAP Banking Account Settlement workload while maintaining total batch elapsed time

• Reduction in GCP utilization may result in significant savings on software licensing costs on z/OS

• Further reduction of software licensing costs by utilizing IBM Z specialty engines

• zIIPs: available for Db2 database servers running on z/OS

• IFLs: available for SAP application servers running on Linux on Z

• Other application workloads may see similar benefits when utilizing SMC-D

Page 44: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Thank You! Please submit your session feedback!

• Do it online at http://conferences.gse.org.uk/2019/feedback/EL

• This session is EL

Page 45: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Backup

Page 46: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

References and Sources

46

Performance Evaluation of SMC-D with SAP Banking on IBM Zhttps://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102792

Performance Evaluation of SAP Application Servers on Z: https://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102759

IBM Z Connectivity Handbook:http://www.redbooks.ibm.com/abstracts/sg245444.html?Open

Large Systems Performance Reference for IBM Z:https://www-01.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprindex?OpenDocument

SAP on IBM Z Reference Architecture: SAP for Banking: https://www.sap.com/documents/2015/07/8aab7888-5b7c-0010-82c7-eda71af511fa.html

z/OS Communication Server SMC Applicability Tool: https://www-01.ibm.com/support/docview.wss?uid=swg27044929&aid=1

Page 47: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Getting Started with z/OS SMC (SMC-R checklist)

47

ü 1. Install hardware: RoCE Express features installed (and physically connected to your Ethernet switch)Note. If VLANs are defined for OSA (TCP/IP profile INTERFACE) then your RoCE Ethernet switch ports must match your OSA switch ports (i.e. all eligible VLANs defined in trunk port)

ü 2. HCD Configuration: RoCE HCD definitions completed for RoCE features (e.g. PFIDs/VFs/PNetID). PFIDs are defined for access (and are accessible / online) for the applicable LPARs (z/OS instances).

ü 3. Associated ports:RoCE and OSA physical ports are associated (both are defined in HCD with the same PNet ID)

ü 4. TCP/IP profile: – Globalconfig SMCR PFIDs (enables SMCR and defines PFIDs to be used) and – OSA INTERFACE; verify subnet mask is defined and SMCR is enabled (on by default)

Page 48: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Getting Started with z/OS SMC (SMC-D checklist)

48

ü 1. HCD configuration:ISM1 HCD definitions completed (e.g. PFIDs/VFs/PNetID). PFIDs are defined for access (and are accessible / online) for the applicable LPARs (z/OS instances).

ü 2. Associate ports:ISM CHID and OSA2 physical port(s) are associated (both are defined in HCD with the same PNetID)

ü 3. TCP/IP profile: – Globalconfig SMCD (enables SMCD, ISM PFIDs are dynamically discovered) and – OSA2 INTERFACE; verify subnet mask is defined and SMCD is enabled (on by default)

Notes: 1. ISM requires IBM z13 (GA2) or later. 2. For the TCP/IP connection setup SMC-D can be associated (via PNet IDs) with OSA or HS.

If HS (IQD) will be used for SMC-D then:1. IQD CHID must have a matching PNet ID with your ISM CHID (HCD)2. TCP/IP profile: DYNAMICXCF (HS) or your HS INTERFACE statement must be configured with a subnet mask and

SMCD must be enabled (on by default)

Page 49: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SMC with z/OS and Linux on Z

Some general reminders / notes:• SMC-R is an open (‘wire’) protocol (RFC 7609). There are no known unique considerations for exploiting SMC

with unlike peer hosts (i.e. the OS type of a peer host is transparent).

• The SMC connection criteria / eligibility remains unchanged. • SMC-D applies within System z. The SMC-D connection protocol is an extension of the SMC-R specifications.

During rendezvous, peer hosts dynamically use SMC-D when possible (when same CPC). A peer host can be enabled for just SMC-R or SMC-D.

• A peer SMC-R host can use a single RoCE card (two RoCE cards are not required). This will form a single link SMC-R LG.

• A peer SMC-R host can use the same card/port for both Ethernet and RoCE.

• There are no known changes (configuration, operational, etc.) required on the z/OS or Linux side (as a server or client) for SMC compatibility. z/OS and Linux SMC solutions are compatible.

49

Page 50: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

z/OS and Linux SMC Preliminary Performance

Preliminary z/OS and Linux micro-benchmarks are provided here: • Recall that networking micro-benchmarks do not include the typical application

business logic (compute / storage I/O etc.). The focus is on the networking technology (application sockets API layer, the OS, communications protocol stack, and interacting with hardware / firmware etc.).

• Preliminary results are shown to illustrate potential benefits. The findings provided here are early / preliminary results, they are not final and are subject to change.

• Results were produced in a controlled environment using standard tools collecting the typical benchmarks / data points (i.e. interactive and streaming workloads, different data sizes, multiple connections etc.). Your results may vary.

• The z/OS version tested was z/OS 2.3 and the Linux version tested was SLES12SP3 (based on a custom version of 4.18 kernel).

50

Page 51: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SMC-D (z/OS and Linux) Preliminary Performance(interactive workloads compared to TCP/IP HS)

51

* All performance information was determined in a controlled environment. Actual results may vary. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM.

Page 52: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

SMC-D (z/OS and Linux) Preliminary Performance(streaming workloads compared to TCP/IP HS)

52

* All performance information was determined in a controlled environment. Actual results may vary. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM.

Page 53: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 53

Linux SMC Backup§ SMC Basics

- Motivation- The SMC Protocol- Benefits

§ SMC for Linux on Z- SMC-D and SMC-R- smc-tools

§ SMC in Action- Usage Examples- Deploying SMC- Tips & Tricks

§ Outlook§ Miscellaneous

Page 54: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 54

SMC-D Overview

§ Intra-CEC connectivity using Internal Shared Memory (ISM) devices

§ IBM Z hardware requirements- IBM z13 (requires driver level 27 (GA2)) and z13s, or later- LinuxONE Emperor and LinuxONE Rockhopper, or later- Classic mode only (i.e. DPM not supported)

§ ISM devices- Virtual PCI network adapter of new VCHID type ISM

l No PCI bus usage

l No extra hardware required

- Provides access to memory shared between LPARs- 32 ISM VCHIDs per CPC, 255 VFs per VCHID (8K VFs per CPC

total)I.e. the maximum no. of virtual servers that can communicate over the same ISM VCHID is 255

- Each ISM VCHID represents a unique (isolated) internal network, each having a unique Physical Network ID

§ PNET ID configuration- IOCDS only- Use HiperSockets, OSA or RoCE cards for regular connectivity

SMC for Linux on Z / SMC-D and SMC-R

Page 55: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 55

SMC-R Overview

§ Cross-CEC connectivity using RoCE Express cards

§ IBM Z hardware requirements- IBM z12EC and z12BC or later- LinuxONE Emperor and Rockhopper or later- Classic and DPM mode supported

§ RoCE Express cards- RoCE Express & RoCE Express2 cards supported- Switches need to support and enable Global Pause (standard

Ethernet switch flow control feature as described in IEEE 802.3x)

§ PNET ID configuration- IOCDS or smc_pnet (→see smc-tools package)- Use OSA or RoCE card for regular connectivity

§ Note:- Linux on Z can use a single RoCE card for regular and

RDMA traffic!- No link failover!

SMC for Linux on Z / SMC-D and SMC-R

Page 56: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 56

SMC-R Link GroupsSMC for Linux on Z / SMC-D and SMC-R

§ SMC-R link groups provide for load balancing and recovery

- New TCP connection is assigned to the SMC-R link with the fewest TCP connections

- Load balancing only performed when multiple RoCE Express adapters are available at each peer

§ Full redundancy requires:- Two or more RoCE Express adapters at each

peer- Unique system internal paths for the RoCE

Express adapters- Unique physical RoCE switches

§ Partial redundancy still possible in the absence of one or more of these conditions

§ Linux on Z:- No failover support (yet)

Page 57: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 57

ComparisonSMC for Linux on Z / SMC-D and SMC-R

Feature SMC-R SMC-DIntra-CEC yes yes

Cross-CEC yes no

RDMA Device RoCE ISM

Interface Type PCI PCI

Bus used PCI -

PNET ID Definition

IOCDS, orsmc_pnet

IOCDS

Failover tbd N/a

Upstream Status Initial code upstream in Linux kernel 4.18

Initial code upstream in Linux kernel

4.19

Page 58: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

5858IBM Z / © 2018 IBM Corporation 5858

SMC in Action / Deploying SMC

SMC-D Performance

58

§ Machine: IBM z14

§ Configuration:- 2 LPARs

- SLES 12 SP4 (updated, kernel level4.12.14-95.13-default)

- Cores per LPAR: 4 IFLs- Memory per LPAR: 64GB- Buffer sizes: Distro defaults(!)

§ Benchmark: uperf (https://github.com/uperf/uperf)

§ Baseline: HiperSockets 56K

§ Note: All results are specific to this setup!

Page 59: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

5959IBM Z / © 2018 IBM Corporation 5959

SMC in Action / Deploying SMC

Workload Performance§ rr1c-200x1000 (highly

transactional)With SMC-D, the transaction times are reduced by 40% compared to 56k-HiperSockets (up to 70% reduction for high numbers of parallel connections).

§ rr1c-200x30k (large transactional)With SMC-D, the throughput increases by 2xfor single connection and up to 3.7x for high numbers of parallel connections.

§ str-writex30k (streaming)Throughput increases by 1.4x for single connection streaming workloads and up to 4.1x for high numbers of parallel connections.

Fig 1: rr1c-200x1000

Fig 2: rr1c-200x30k

Fig 3: str-writex30k 59

Page 60: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Performance of SMC-D versus Hipersockets on z15

IBM Z / September 12, 2019 / © 2019 IBM Corporation

DISCLAIMER: Performance results based on IBM internal tests running uperf (downloaded fromhttps://github.com/uperf/uperf/tree/09fbbdb93e4f0e6569bd532ffd5a4d5969d3eb84) to measure network performance between z15LPARs. Results may vary. z15 configuration: 2 LPARs, each with 4 dedicated IFLs, 64 GB memory, SLES 12 SP4 (SMT mode) running uperfwith different network workload profiles. IFLs of both LPARs were placed on the same chip.

SMC-D Performance

SMC-D delivers up to 3.9X more throughput between z15 LPARs compared to using Hipersockets

2.1x

3x

3.7x 3.7x

1.4x

3.4x3.9x 3.9x

60

Page 61: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Performance of SMC-D versus Hipersockets on z15

IBM Z / September 12, 2019 / © 2019 IBM Corporation

DISCLAIMER: Performance results based on IBM internal tests running uperf (downloaded fromhttps://github.com/uperf/uperf/tree/09fbbdb93e4f0e6569bd532ffd5a4d5969d3eb84) to measure networkperformance between z15 LPARs. Results may vary. z15 configuration: 2 LPARs, each with 4 dedicated IFLs, 64GB memory, SLES 12 SP4 (SMT mode) running uperf with different network workload profiles. IFLs of bothLPARs were placed on the same chip.

SMC-D Performance

SMC-D delivers up to 74% shorter response time between z15 LPARs compared to using HiperSockets

61

38%

51%

72% 74%

Page 62: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

Performance of SMC-D versus Hipersockets on z15

IBM Z / September 12, 2019 / © 2019 IBM Corporation

SMC-D Performance

Benchmark Setup– Ran uperf network benchmark with different workload profiles:

• Highly transactional, medium data sizes: iteratively send 200 bytes of data and receive 1000 bytes of data (client point of view)

• Transactional, large data sizes: iteratively send 200 bytes of data and receive 30720 bytes of data (client point of view)

• Streaming writes: continuously write in 30720 byte chunks of data (client point of view)

– Each workload profile was run with 1, 10, 50, and 250 parallel connections

System Stack– z15

• 2 LPARs, each with 4 dedicated IFLs, 64 GB memory, 40 GB DASD storage, running SLES 12 SP4 with SMT enabled

• IFLs of both LPARs were placed on the same chip

• HiperSockets configured with 56k (HS56k) with an MTU size of 57344 B• uperf network benchmark

62

z15

LPAR 1(4 IFL,

64 GB memory)

HS56kPNETX5

SMC-DPNETX5

LPAR 2(4 IFL,

64 GB memory)

HS56kPNETX5

SMC-DPNETX5

Handshake

Traffic

SMC-D Setup

z15

LPAR 1(4 IFL,

64 GB memory)

HS56kPNETX5

LPAR 2(4 IFL,

64 GB memory)

HS56kPNETX5

Handshake

+ Traffic

HiperSocket Setup

Page 63: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

6363IBM Z / © 2018 IBM Corporation 6363

SMC in Action / Deploying SMC § Expect:- Reduction of CPU usage- Lower latency- Higher effective throughput- Higher maximum throughput (SMC-D only)

§ But consider:- CLC handshake adds add’l round trips prior to actual

traffic→Minimum number of transmits required to break even

- Requires additional shared memory ranging from 16 KB to 512 KB (SMC-R) or 1 MB (SMC-D), determined as follows:

l Socket send buffer sizes derived from (in sequence): SO_SNDBUF, or (if not available) net.ipv4.tcp_wmem (default value)

l Socket receive buffer sizes derived from net.ipv4.tcp_rmem, rounded to next higher power of 2

- z/OS:l When communicating with z/OS, performance

benefits as demonstrated by z/OS will materialize independent of remaining potential in Linux

l z/OS: Use SMC-AT (https://ibm.biz/BdZUgp) to estimate potential for SMC in your traffic patterns

PerformanceConsiderations

63

Page 64: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 64

Usage Example: SAP on IBM ZSMC in Action / Deploying SMC

§ Deploy- DB2 SAP Database on z/OS- SAP Central Services and SAP Application

Server on Linux on Z

§ Provides lower latency, less CPU used

§ Whitepaper:Performance Evaluation of SMC-D with SAP Banking on IBM Zhttp://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102792

→ Higher transaction rates

Page 65: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 65

How to verify Hardware Setup

§ Use lspci to check RoCE and ISM device availability:$ lspci0000:00:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family \

[ConnectX-3/ConnectX-3 Pro Virtual Function]0001:00:00.0 Non-VGA unclassified device: IBM Internal Shared Memory (ISM) virtual PCI device

§ (z/VM only) Verify card attachment$ vmcp ‘QUERY PCIFUNCTION’PCIF 00000280 ATTACHED TO S8360018 00000280 ENABLED 10GbE RoCEPCIF 000002E2 ATTACHED TO S8360018 000002E2 ENABLED ISM

§ Verify PNET IDs are set and match (use smc_rnics and/or smc_dbg if available)# RoCE device$ cat /sys/class/net/ens1/device/util_string | iconv -f IBM-1047 -t ASCIINetworkA

# ISM device with PCI ID 0001:00:00.0 according to lspci:$ cat /sys/bus/pci/devices/0001:00:00.0/util_string | iconv -f IBM-1047 -t ASCIIvNetB

# OSA or HiperSockets device$ cat /sys/devices/css0/chp0.`cat /sys/class/net/enccw0.0.f500/device/chpid`/util_string \

| iconv -f IBM-1047 -t ASCIINetworkA

SMC in Action / Tips & Tricks

Page 66: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 66

§ RoCE Express cards: Verify port and NIC states:$ cat /sys/class/infiniband/mlx4_0/ports/1/phys_state5: LinkUp$ cat /sys/class/infiniband/mlx4_0/ports/1/state4: ACTIVE$ ip link show ens23: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 \

qdisc mq state UP mode DEFAULT group default qlen 1000link/ether 82:03:14:32:f1:a0 brd ff:ff:ff:ff:ff:ff

§ RoCE Express cards with VLANs: Verify VLAN interface exists and has IP address assigned (interface can be down)

$ ip addr show ens2.2017: ens2.201: <BROADCAST,MULTICAST,DOWN,LOWER_UP> mtu 1500 \

qdisc fq_codel master virbr0 state UNKNOWN group default \qlen 1000

link/ether fe:54:00:f9:cf:be brd ff:ff:ff:ff:ff:ffinet 192.168.23.42/24 scope global vnet3

valid_lft forever preferred_lft forever

§ Check available free memory via /proc/meminfo:$ grep "^Mem" /proc/meminfoMemTotal: 1710584 kBMemFree: 83404 kBMemAvailable: 1125752 kB

How to verify OS SetupSMC in Action / Tips & Tricks

n Check smcss output for reason code:$ smcss -aState UID Inode Local Address [...] Intf ModeACTIVE 20000 115762 10.101.4.8:60594 [...] 0000 TCP 0x05000000/0x0000521e

§ Troubleshooting z/OS- “Autonomics” function might disable

connections to peers that are unlikely to benefit from SMC (most typically: Short-lived connections exchanging few data) for a certain period of time. In the TCP/IP profile change

GLOBALCONFIG SMCGLOBAL NOAUTOSMC

- See https://ibm.biz/BdZts8 for further details.

Page 67: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 67

How to enable unruly Applications

§ Some applications have involved startup procedures that will not easily work with smc_run

§ Example:DB2 requires registration of environment variables through db2set command.

$ db2set -i db2inst1 \DB2ENVLIST="LD_LIBRARY_PATH \LD_PRELOAD"

$ smc_run db2start

§ If smc_run does not work for an application with PID <p>:

- Check /proc/<p>/environ whether LD_PRELOAD is set correctly!

§ Alternative approaches:- Set LD_PRELOAD in the user ID’s profile

that starts the respective processes, e.g. the DB2 instance owner:

$ echo “export LD_PRELOAD=\libsmc_preload.so” >> ~/.profile

- Use /etc/ld.so.preload to enable the entire system:

$ cat /etc/ld.so.preloadlibsmc_preload.so

§ Note: This will add the preload library to allprocesses on the entire system or started by the respective user! This includes processes not performing any socket operations at all, e.g. the ls command.

§ Therefore, always prefer usage of smc_run

SMC in Action / Tips & Tricks

Page 68: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 68

Agenda§ SMC Basics

- Motivation- The SMC Protocol- Benefits

§ SMC for Linux on Z- SMC-D and SMC-R- smc-tools

§ SMC in Action- Usage Examples- Deploying SMC- Tips & Tricks

§ Outlook§ Miscellaneous

Page 69: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

6969IBM Z / © 2018 IBM Corporation 69

Outlook

69

Moretocome!

§ Performance optimizations

§ Failover support (SMC-R)

§ Blacklisting peer IPs/ports

§ Improved diagnostics

§ Usage statistics

§ ...

69

Page 70: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM Corporation 70

SummaryMiscellaneous

Typical Workloads To Benefit§ Transaction-oriented,

§ latency-sensitive, and

§ bulk data streaming, e.g. when running backups.

Key Attributes§ Leverages existing Ethernet infrastructure

(SMC-R)

§ Transparent to (TCP socket based) application software

§ Preserves existing network addressing-based security models

§ Preserves existing IP topology and network administrative and operational model

§ Transparent to network components such as channel bonding and load balancers

§ Built-in failover capabilities (SMC-R)

Page 71: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

7171IBM Z / © 2018 IBM Corporation 71

Miscellaneous

71

References

§ smc-tools Homepagehttps://www.ibm.com/developerworks/linux/linux390/smc-tools.html

§ Whitepaper: Performance Evaluation of SMC-D with SAP Banking on IBM Zhttp://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102792

§ RFC7609 (SMC-R)https://tools.ietf.org/html/rfc7609

§ Linux on Z (technical):https://www.ibm.com/developerworks/linux/linux390/

§ SMC for Linux on Z:http://linux-on-z.blogspot.com/p/smc-for-linux-on-ibm-z.html

§ Webcastshttps://developer.ibm.com/tv/linux-ibm-z/

§ Blogs- Linux on z distributions new

http://linuxmain.blogspot.com/

- Linux on Z latest development newshttp://linux-on-z.blogspot.com/

- KVM on Zhttp://kvmonz.blogspot.com/ 71

Page 72: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

IBM Z / © 2018 IBM CorporationIBM Z / © 2018 IBM Corporation

Page 73: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

7373IBM Z / © 2018 IBM Corporation 7373

SMC in Action / Deploying SMC

SMC-D Performance

73

§ Machine: IBM z14

§ Configuration:- 2 LPARs

- Fedora28 with custom 4.16 kernel- Cores per LPAR: 10 IFLs- Memory per LPAR: 4GB

§ SMC-D Setup (Client & Server)- Send buffer: 64KB- Receive buffer: 256KB

§ Benchmark: uperf (https://github.com/uperf/uperf)

§ Baseline: HiperSockets 32K

§ Note: All results are preliminary and specific to this setup!

Page 74: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

7474IBM Z / © 2018 IBM Corporation 7474

SMC in Action / Deploying SMC

SMC-D Performance

74

§Workload: rr1c-200x1000§Results:

- Transaction times reduced by 20% with SMC-D(up to 80% reduction for high numbers of parallel connections)

- CPU consumption reduced by 30% to 50% for client and server with SMC-D

Page 75: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

7575IBM Z / © 2018 IBM Corporation 7575

SMC in Action / Deploying SMC

SMC-D Performance

75

§ Workload: rr1c-200x30K

§ Results:- Throughput increases by 1.4x

(single connection) and up to 5x (high number of parallel connections) with SMC-D

- CPU consumption reduced by 35% to 45% for the client with SMC-D

- CPU consumption reduced by 15% to 35% for the server with SMC-D

Page 76: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

7676IBM Z / © 2018 IBM Corporation 7676

SMC in Action / Deploying SMC

SMC-D Performance

76

§ Workload: str-writex30k

§ Results:- Throughput increases by 1.7x

(single connection) and up to 6.9x (high number of parallel connections) with SMC-D

- CPU consumption reduced by 10% to 50% for the client with SMC-D

- CPU consumption reduced by 40% to 50% for the server with SMC-D

Page 77: IBM Shared Memory Communications › 2019 › presentations › EL.pdf · IBM Shared Memory Communications SMC for z/OS and Linux on Z JerryStevens(sjerry@us.ibm.com) IBM November

End of Materials