VIRT1773BU Deep Dive on or distribution - RainFocus...VIRT1773BU Deep Dive on pNUMA & vNUMA Save...

57
VIRT1773BU Deep Dive on pNUMA & vNUMA Save your SQL VMs from certain DoomA! Rob Girard Principal TME #VMworld #VIRT1773BU Shawn Meyers SQL Server Principal Architect VMworld 2017 Content: Not for publication or distribution

Transcript of VIRT1773BU Deep Dive on or distribution - RainFocus...VIRT1773BU Deep Dive on pNUMA & vNUMA Save...

VIRT1773BU

Deep Dive on pNUMA & vNUMASave your SQL VMs from certain DoomA!

Rob GirardPrincipal TME

#VMworld #VIRT1773BU

Shawn MeyersSQL Server Principal Architect

VMworld 2017 Content: Not fo

r publication or distri

bution

Virtualizing Applications Track Sessions and Offerings

• 30 Breakout Sessions with 2 Panels & 3 Quick Talks

• 10 BCA Meet-The-Experts sessions (15min 1-on-1 appts)

• 2 Birds-of-a-Feather special invitation receptions (Oracle & SAP)

• 5 Group Discussions

• 3 Saturday - Full Day Applications Bootcamps• Sign up for the Independent Oracle User Group

(IOUG) VMware Special Interest Group (SIG)www.ioug.org/vmware

VMworld 2017 Content: Not fo

r publication or distri

bution

The Percentage of Applications in Virtualized Infrastructure Has Increased Dramatically Over the Last Few Years

(VMware Core Metrics Survey 2016)

3

NA EU dAP BRIC SMB COMM ENT

80% 81% 75% 84% 75% 81% 86%

57% 70% 66% 71% 59% 70% 68%

52% 55% 49% 58% 48% 51% 60%

61% 44% 43% 51% 41% 56% 60%

36% 51% 48% 55% 32% 45% 59%

32% 29% 40% 38% 32% 35% 34%

38% 22% 24% 31% 24% 33% 34%

26% 28% 30% 36% 24% 37% 30%

18% 29% 41% 40% 21% 31% 35%

19% 20% 26% 29% 18% 24% 26%

388 289 139 208 401 217 406

Region Company Size

81%

65%

53%

52%

46%

33%

30%

29%

29%

22%

Microsoft SQL

Custom/Industry-Specific Business…

Microsoft Exchange

Microsoft SharePoint

SAP

Oracle Databases

IBM Middleware

Oracle Applications

High Performance Computing

Oracle Middleware

% Respondents Running the Application in Virtualized Infrastructure

> Total

< Total

N = 1024

VMworld 2017 Content: Not fo

r publication or distri

bution

Where Can I Learn More?

▪ Business Critical Applications VMware.com Homepage Page

• https://www.vmware.com/solutions/business-critical-apps.html

▪ VMware – DellEMC Collaborative Collateral and DBTA Surveys

• http://www.dbta.com/emc

▪ Blogs

• vSphere Blog

• https://blogs.vmware.com/vsphere/

• One Stop Shop - All Oracle on VMware SDDC

• https://blogs.vmware.com/apps/2017/01/oracle-vmware-collateral-one-stop-shop.html

• VMware IOUG Special Interest Group

• http://vmsig.org/

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

5#VIRT1773BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

6

About Shawn

Shawn Meyers

• SQL Server Principal Architect, practice lead

• Experience in VMware, Microsoft, SQL Server, storage infrastructure, performance tuning.

• Working in IT since 1992, SQL Server since 1996, VMware since 2009

@1dizzygoose linkedin.com/in/shawnmeyers42

VMworld 2017 Content: Not fo

r publication or distri

bution

7

About Rob

Rob Girard

• Principal Technical Marketing Engineer @ Tintri as of Jan, 2014

• Working in IT since 1997 with >12 years of VMware experience

• vExpert, VCAP4/5-DCA, VCAP4-DCD, VCP2/4/5, MCSE, CCNA AND TCSE

@robgirard www.linkedin.com/in/robgirard

VMworld 2017 Content: Not fo

r publication or distri

bution

8

• Always use a “Green Line” configuration to match optimized VM size to underlying physical

topology, while presenting the correct Socket & Cores to the Guest OS

• Leave Hot Add CPU off

• Adjust Virtual Machine Advanced Settings

• numa.autosize.once FALSE

• numa.autosize TRUE (deprecated in vSphere 6.5, which defaults to TRUE)

– Leave everything else alone – VMware does a great job of managing vNUMA

• If you want to know why, what all the other knobs are & their impact, as well as our testing to

prove these settings…. STICK AROUND!

2 Minute Version

VMworld 2017 Content: Not fo

r publication or distri

bution

9

Introduction

Met at SQL Elite Workshop, hosted by VMware and Tintri [April 2015]

Partnered to share expertise with different aspects of virtualization

Delivered VAP6433 Group Discussion session @ VMworld 2015

This session summarizes the research & lab behind that session

For those who want to understand how it works under the cover

VMworld 2017 Content: Not fo

r publication or distri

bution

10

Agenda

Explain pNUMA

and vNUMA

How vNUMA

works in VMware

vNUMA balancing

and boundaries

Advanced vNUMA

settings

Lab results

& findings

Monitoring

vNUMA

VMworld 2017 Content: Not fo

r publication or distri

bution

11

Non Uniform Memory Access (NUMA)

SMP vs NUMA

CP

U

CP

U

CP

U

CP

U

Memory

Controller

I/O

Controller

SMP Memory Program

Symmetrical Multiprocessing

(SMP)

Non Uniform Memory Access

(NUMA)

• Large physical machines ran into scale problems with memory access

• NUMA was created to divide up memory address space between CPUs

CPU CPU CPU CPU

Memory

Controller

I/O

Controller

CPU CPU CPU CPU

Memory

Controller

I/O

Controller

NUMA Diagram

Interconnect

VMworld 2017 Content: Not fo

r publication or distri

bution

12

There are 2 NUMA nodes per processor

4 socket server will have 8 NUMA nodes

AMD NUMA

I/O

Controller

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

VMworld 2017 Content: Not fo

r publication or distri

bution

13

Intel processors have one NUMA node per processor

Notice the QPI links between each CPU

Intel NUMA

I/O

Controller

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

Me

mo

ry

Con

trolle

rCPU

I/O

Controller

VMworld 2017 Content: Not fo

r publication or distri

bution

14

Intel processors have one NUMA node per processor

Notice the QPI links between each CPU

Intel Cluster On Die (COD)

Performance impact to ESXi varies up to 35%,

depending on workload, according to VMware

Controlled in BIOS; recommend OEM Default

Affects 10 cores or more

Available Haswell (v3) and later

Graphic from https://www.starwindsoftware.com/blog/numa-and-cluster-on-die

VMworld 2017 Content: Not fo

r publication or distri

bution

15

pNUMA vs vNUMA

vNUMA presents the pNUMA nodes to the virtual machine OS

Since vNUMA is software we can tune when it does not automatically match the desired configuration

Windows, Linux, and SQL server are all natively NUMA aware and have been for a very long time

vNUMA virtual NUMA presentation to a virtual machine

pNUMA NUMA architecture of the physical machine

VMworld 2017 Content: Not fo

r publication or distri

bution

16

Soft NUMA

SQL Server has a concept called soft NUMA, been around forever

Changed in SQL Server 2016; creates logical NUMA nodes up to 8 cores each

Works in conjunction with VMware vNUMA, not a substitute

SQL Server and Intel have all found 8 cores is the magic number for optimal

memory throughputVMworld 2017 Content: N

ot for publicatio

n or distribution

17

Memory

Controller

CPU

Hypervisor

APP

OS

Memory

Controller

CPU

Default, only comes into play when there are 9

vCPUs or more

If you have 4 or 6 core processors in your host and VMs with

more vCPU than cores you WILL have NUMA issues!

Consider changing the numa.min.vcpu on the virtual machine

to allow for vNUMA to take effect below this threshold

This can be set at VM level.

Introduced in vSphere 5.0, but improved in 5.5, 6.0 & 6.5

vNUMA

VMworld 2017 Content: Not fo

r publication or distri

bution

18

Test Methodology, Tools & Lab Setup

In-Guest analysis

Host Mem Usage analysis: ESXTOP (M for

memory, f to choose fields, g for NUMA fields)

.vmx file Analysis (to validate changes made via

GUI, vMotions to other hardware, impact of reboot

vs power cycle, FIRST BOOT vs others, etc…)

Worst Case Analysis – Pinning CPUs &

Memory to specific cores & nodes

01

02

03

04 VMworld 2017 Content: Not fo

r publication or distri

bution

19

• 1 x AMD Server: 2 x 16 core + 256 GB RAM

• 1 x Intel Server: 2 x 16 core + 384 GB RAM

• Tintri VMstore for storage

• SQL VMs - Win 2012 R2 + SQL 2014

• Size varied for CPU & RAM

• HammerDB

– Master/Slave config: 10 VMs @ 8 vCPU each,

– 16 virtual users per client against 24 vCPU SQL w/ 224GB RAM

Test Methodology, Tools & Lab Setup – Con’t

Lab:

VMworld 2017 Content: Not fo

r publication or distri

bution

20

• Task Manager can show you NUMA

nodes by right-clicking the graph

Determine NUMA configuration from Windows

VMworld 2017 Content: Not fo

r publication or distri

bution

21

• Task Manager can show you NUMA

nodes by right-clicking the graph

Determine NUMA configuration from Windows

VMworld 2017 Content: Not fo

r publication or distri

bution

22

• Task Manager can show you NUMA

nodes by right-clicking the graph

Determine NUMA configuration from Windows

VMworld 2017 Content: Not fo

r publication or distri

bution

23

• Resource monitor (CPU tab) shows more

detailed info about the CPUs and which NUMA

node they belong to

Determine NUMA configuration from Windows– Con’t

VMworld 2017 Content: Not fo

r publication or distri

bution

24

Check NUMA in SQL – Con’t

select * from sys.dm_os_memory_nodes

VMworld 2017 Content: Not fo

r publication or distri

bution

25

Check NUMA in SQL

select * from sys.dm_os_schedulers

VMworld 2017 Content: Not fo

r publication or distri

bution

26

Checking NUMA on Host (ESXTOP)

VMworld 2017 Content: Not fo

r publication or distri

bution

27

Checking NUMA on Host (ESXTOP) – Con’t

VMworld 2017 Content: Not fo

r publication or distri

bution

28

• Host 2 socket 12 cores, 384 GB of memory

• Each NUMA node is 12 cores and 192 GB of memory

• VM with 12 cores and 256 GB of memory will have two NUMA nodes, 6 cores each with 128 GB of memory per node

NUMA Node Balancing

• NUMA imbalance occurs when there is a mismatch between the number

CPU and memory for a virtual machine and the physical hardware.

• Since NUMA is a collection of CPU and memory resources ensure you

are sized correctly

• Two NUMA nodes means the memory is split in half

VMware rarely makes an imbalance when it auto-configures NUMA

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

NUMA Node

VMworld 2017 Content: Not fo

r publication or distri

bution

29

NUMA Penalty

NUMA wants to

schedule the thread on

the CPU where the

memory being assigned

to the thread

Memory lookup has a

cost which is known as

the NUMA penalty

When a thread runs but

the memory it needs is

in the other NUMA node

a memory lookup occurs

VMworld 2017 Content: Not fo

r publication or distri

bution

30

In our testing (HammerDB workload), we found the penalty to be as great as a 40% drop in performance!

Penalty varies by workload

NUMA Penalty – Con’t

VMworld 2017 Content: Not fo

r publication or distri

bution

31

• 8 vCPU machine will still run, but you will lose consolidate rates

• Most SQL server virtualization consolidation is not the main goal

• For large machines having them be multiples of the number of cores runs best, 12, 24, 36 vCPUs

• Remember to leave room for the hypervisor

VM Sizing

• Example 12 core servers work best with virtual machines sized

1, 2, 3, 4, 6, or 12 vCPUs

• Size a VM to fit inside a single NUMA node for best performance

• Right size your workloads

• For best CPU scheduling size all virtual machines to be evenly

divisible by the number cores in the processor

https://www.vmware.com/techpapers/2017/Perf_Best_Practices_vSphere65.html

VMworld 2017 Content: Not fo

r publication or distri

bution

32

Virtual Nodes – 24 cores on a 16 core CPU

VMworld 2017 Content: Not fo

r publication or distri

bution

33

Cores vs Sockets

• 1 core per socket (“wide”) allows the CPU scheduler the most flexibility on

scheduling, BUT can have a negative impact when interpreted by software

• vSphere will determine the best NUMA topology for a VM on first boot. This

is set in the .VMX file

• Changing from 1 core per socket, locks in the vNUMA configuration, vSphere

cannot update it (Autosize settings are ignored)

• Use multiple cores to save on licensing for applications you pay per socket.

• If you are sure of the underlying hardware you can change these settings to

match NUMA boundaries (Recommended)

• If you desire a non-standard NUMA configuration you can change them here

• Results do vary, you need to test to validate, each workload is impacted by

NUMA differently.

http://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf

Core

VMworld 2017 Content: Not fo

r publication or distri

bution

34

• cpuid.coresPerSocket = 1 (default)

• Determines number of virtual cores per socket

Cores & Sockets – VM Settings

numa.vcpu.followcorespersocket = 0 (default)

• If set to 1, reverts to the old behaviour for virtual

NUMA node sizing being tied to

cpuid.coresPerSocket

NEW IN vSPHERE 6.5

VMworld 2017 Content: Not fo

r publication or distri

bution

35

Cores & Sockets – vSphere 6.5 “Green Line” Configurations

https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html

VMworld 2017 Content: Not fo

r publication or distri

bution

36

numa.vcpu.maxPerVirtualNode=8 (default)

• Used to span additional NUMA nodes

numa.vcpu.preferHT=False (default)

• Enable if you want to use HT cores and less NUMA nodes

numa.vcpu.min=9 (default)

• Threshold for when vNUMA will take effect

numa.autosize.once=True (default)

• Recommended: False – Change behavior to recalculate

vNUMA on every power cycle

numa.autosize=False (default) DEPRECATED (v6.5)

• Change to True to have VM recalculate vNUMA on every

power cycle - *RECOMMENDED*

VM Advanced vNUMA Settings – con’t

VMworld 2017 Content: Not fo

r publication or distri

bution

37

numa.autosize.cookie=[auto-generated value]

• What VMware calculated as your vNUMA config

• (160001) = 16 sockets, 1 core each

• numa.autosize.vcpu.maxPerVirtualNode

• = [auto-generated value]

• How many cores per NUMA nodes based on the autosize

• 8 shown in example – boundary of the host we are using (AMD 16

cores x 2 sockets)

Auto-Generated Settings – LOOK, BUT DON’T TOUCH!

NOTE: As of vSphere 6.5 (and latest patches of vSphere 6.0), these settings are no longer visible in the UI, but can still be found in the .vmx filesVMworld 2017 Content: N

ot for publicatio

n or distribution

38

VM Advanced vNUMA Settings – con’t

…..But you CAN access the VMX file via CLI or Datastore Browser!

TIP: You can’t see Advanced config settings while a VM is running….

VMworld 2017 Content: Not fo

r publication or distri

bution

39

What Does Auto-Sized NUMA Look Like?

Note: If cpuid.coresPerSocket or numa.vcpu.maxPerVirtualNode is present in a VM’s VMX file,

Autosize is ignored

numa.autosize.vcpu.maxPerVirtualNode= 12 (or 24 or 8 or ….?)

numa.autosize.cookie= 240001

VMworld 2017 Content: Not fo

r publication or distri

bution

40

• Tested on 4 NUMA node system

NUMA AutoSize

VMworld 2017 Content: Not fo

r publication or distri

bution

41

VMware Hot-Add Gotchas

• When you turn on CPU hot add, it will disable vNUMA

• Memory HotAdd works fine with one caveat

• In VMware hardware version 8-10 adding memory to a vNUMA machine

it only added to NUMA node 0

• You would then have a NUMA memory imbalance

• Requires a power cycle of the virtual machine to correct the imbalance

• Hardware versions 11+ (vSphere 6.0 +) balances the memory as it is

added

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

tro

lle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

Me

mo

ry

Con

trolle

r

CPU

RA

MR

AM

RA

MR

AM

+

+

+

+

VMworld 2017 Content: Not fo

r publication or distri

bution

42

• numa.autosize TRUE

• numa.autosize.once FALSE

Update NUMA Configuration

• NUMA for a virtual machine is calculated at first power on.

• Only updates when you change the number of cores

• When a vMotion occurs between different hardware with

different underlying NUMA configuration it is not updated.

• Three scenarios:

• To force update review and/or update of NUMA topology to VM

there are two settings to add to the advanced section of the VM.

http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf

NUMA node size configuration is

smaller or the same to the new host,

no real change≤NUMA is smaller and is not evenly

divisible, then NUMA is basically

disabled<NUMA node size is larger but is evenly

divisible, then the NUMA node is divided

up to match however the OS will not

know the memory locality÷>

VMworld 2017 Content: Not fo

r publication or distri

bution

43

Prefer HT

• Off by default

• Host setting and VM setting

• If using, set at the VM level in nearly all cases

• Only turn it on when you have more vCPU than NUMA node size but your

memory still fits into one NUMA node.

• This will allow all threads to schedule on one processor and all memory is local

• Workloads with lots of inter-thread communication will benefit

• Mileage may vary and you should test your workload each way, the answer will

depend up the value of local memory vs having a full CPU cycle

https://blogs.vmware.com/vsphere/2014/03/perferht-use-2.html

Core

Hyper-threading: Doubling the number of processing threads per core

VMworld 2017 Content: Not fo

r publication or distri

bution

44

vNUMA Host Settings

• In nearly all cases do not TOUCH!!!

• Mostly covers when and how a host will change a VM from one NUMA node to another

• Most large virtual machines are not impacted by this as they don’t change

• Upon VM boot it is assigned a NUMA node or nodes

• If too many VMs are running on one NUMA node causing CPU pressure, ESXi will move a VM between nodes

• CPU thread move instantly, memory moves slowly

• ESXi will try to keep VMs communicating over the network with each other together for improved network speed

VMworld 2017 Content: Not fo

r publication or distri

bution

45

• Node interleaving off means NUMA is on

• Node interleaving on for SMP configurations

NUMA in BIOS

• NUMA can be turned off in the hardware BIOS

• Ensure it is enabled

• Every hardware vendor seems to call it something slightly

different

• Most have NUMA enabled by default

• “Node Interleaving” is the most common name

VMworld 2017 Content: Not fo

r publication or distri

bution

46

Before you blame NUMA….

• An important finding throughout this testing is how much impact

database optimization can have!

• More importantly, how negative NOT optimizing your database.

• HammerDB (a sample application) grinds to a crawl after

prolonged use…. Optimization can breath new life!

• In our case… 1.25 Million TPM down to <1000 TPM!!!!

• DB Size: 200 GB (2,000 warehouses) -> 245 GB -> 375 GB

(optimized)

• NUMA should be one of the last things you look at if 1 core per

socket is setVMworld 2017 Content: N

ot for publicatio

n or distribution

47

Additional Screenshots….

VMworld 2017 Content: Not fo

r publication or distri

bution

48

VMworld 2017 Content: Not fo

r publication or distri

bution

49

VMworld 2017 Content: Not fo

r publication or distri

bution

50

VMworld 2017 Content: Not fo

r publication or distri

bution

51

VMworld 2017 Content: Not fo

r publication or distri

bution

52

VMworld 2017 Content: Not fo

r publication or distri

bution

53

VMworld 2017 Content: Not fo

r publication or distri

bution

54

VMworld 2017 Content: Not fo

r publication or distri

bution

55

Closing Comments

When in doubt,

DON’T TOUCH IT!

Topic only applies to very

large VMs that don’t fit into

NUMA nodes and require

maximum performance

If you think you have a

handle on NUMA, that

may be even more

dangerous!VMworld 2017 Content: N

ot for publicatio

n or distribution

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution