VIRT1773BU Deep Dive on or distribution - RainFocus...VIRT1773BU Deep Dive on pNUMA & vNUMA Save...
Transcript of VIRT1773BU Deep Dive on or distribution - RainFocus...VIRT1773BU Deep Dive on pNUMA & vNUMA Save...
VIRT1773BU
Deep Dive on pNUMA & vNUMASave your SQL VMs from certain DoomA!
Rob GirardPrincipal TME
#VMworld #VIRT1773BU
Shawn MeyersSQL Server Principal Architect
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtualizing Applications Track Sessions and Offerings
• 30 Breakout Sessions with 2 Panels & 3 Quick Talks
• 10 BCA Meet-The-Experts sessions (15min 1-on-1 appts)
• 2 Birds-of-a-Feather special invitation receptions (Oracle & SAP)
• 5 Group Discussions
• 3 Saturday - Full Day Applications Bootcamps• Sign up for the Independent Oracle User Group
(IOUG) VMware Special Interest Group (SIG)www.ioug.org/vmware
VMworld 2017 Content: Not fo
r publication or distri
bution
The Percentage of Applications in Virtualized Infrastructure Has Increased Dramatically Over the Last Few Years
(VMware Core Metrics Survey 2016)
3
NA EU dAP BRIC SMB COMM ENT
80% 81% 75% 84% 75% 81% 86%
57% 70% 66% 71% 59% 70% 68%
52% 55% 49% 58% 48% 51% 60%
61% 44% 43% 51% 41% 56% 60%
36% 51% 48% 55% 32% 45% 59%
32% 29% 40% 38% 32% 35% 34%
38% 22% 24% 31% 24% 33% 34%
26% 28% 30% 36% 24% 37% 30%
18% 29% 41% 40% 21% 31% 35%
19% 20% 26% 29% 18% 24% 26%
388 289 139 208 401 217 406
Region Company Size
81%
65%
53%
52%
46%
33%
30%
29%
29%
22%
Microsoft SQL
Custom/Industry-Specific Business…
Microsoft Exchange
Microsoft SharePoint
SAP
Oracle Databases
IBM Middleware
Oracle Applications
High Performance Computing
Oracle Middleware
% Respondents Running the Application in Virtualized Infrastructure
> Total
< Total
N = 1024
VMworld 2017 Content: Not fo
r publication or distri
bution
Where Can I Learn More?
▪ Business Critical Applications VMware.com Homepage Page
• https://www.vmware.com/solutions/business-critical-apps.html
▪ VMware – DellEMC Collaborative Collateral and DBTA Surveys
• http://www.dbta.com/emc
▪ Blogs
• vSphere Blog
• https://blogs.vmware.com/vsphere/
• One Stop Shop - All Oracle on VMware SDDC
• https://blogs.vmware.com/apps/2017/01/oracle-vmware-collateral-one-stop-shop.html
• VMware IOUG Special Interest Group
• http://vmsig.org/
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
5#VIRT1773BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
6
About Shawn
Shawn Meyers
• SQL Server Principal Architect, practice lead
• Experience in VMware, Microsoft, SQL Server, storage infrastructure, performance tuning.
• Working in IT since 1992, SQL Server since 1996, VMware since 2009
@1dizzygoose linkedin.com/in/shawnmeyers42
VMworld 2017 Content: Not fo
r publication or distri
bution
7
About Rob
Rob Girard
• Principal Technical Marketing Engineer @ Tintri as of Jan, 2014
• Working in IT since 1997 with >12 years of VMware experience
• vExpert, VCAP4/5-DCA, VCAP4-DCD, VCP2/4/5, MCSE, CCNA AND TCSE
@robgirard www.linkedin.com/in/robgirard
VMworld 2017 Content: Not fo
r publication or distri
bution
8
• Always use a “Green Line” configuration to match optimized VM size to underlying physical
topology, while presenting the correct Socket & Cores to the Guest OS
• Leave Hot Add CPU off
• Adjust Virtual Machine Advanced Settings
• numa.autosize.once FALSE
• numa.autosize TRUE (deprecated in vSphere 6.5, which defaults to TRUE)
– Leave everything else alone – VMware does a great job of managing vNUMA
• If you want to know why, what all the other knobs are & their impact, as well as our testing to
prove these settings…. STICK AROUND!
2 Minute Version
VMworld 2017 Content: Not fo
r publication or distri
bution
9
Introduction
Met at SQL Elite Workshop, hosted by VMware and Tintri [April 2015]
Partnered to share expertise with different aspects of virtualization
Delivered VAP6433 Group Discussion session @ VMworld 2015
This session summarizes the research & lab behind that session
For those who want to understand how it works under the cover
VMworld 2017 Content: Not fo
r publication or distri
bution
10
Agenda
Explain pNUMA
and vNUMA
How vNUMA
works in VMware
vNUMA balancing
and boundaries
Advanced vNUMA
settings
Lab results
& findings
Monitoring
vNUMA
VMworld 2017 Content: Not fo
r publication or distri
bution
11
Non Uniform Memory Access (NUMA)
SMP vs NUMA
CP
U
CP
U
CP
U
CP
U
Memory
Controller
I/O
Controller
SMP Memory Program
Symmetrical Multiprocessing
(SMP)
Non Uniform Memory Access
(NUMA)
• Large physical machines ran into scale problems with memory access
• NUMA was created to divide up memory address space between CPUs
CPU CPU CPU CPU
Memory
Controller
I/O
Controller
CPU CPU CPU CPU
Memory
Controller
I/O
Controller
NUMA Diagram
Interconnect
VMworld 2017 Content: Not fo
r publication or distri
bution
12
There are 2 NUMA nodes per processor
4 socket server will have 8 NUMA nodes
AMD NUMA
I/O
Controller
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
VMworld 2017 Content: Not fo
r publication or distri
bution
13
Intel processors have one NUMA node per processor
Notice the QPI links between each CPU
Intel NUMA
I/O
Controller
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
rCPU
I/O
Controller
VMworld 2017 Content: Not fo
r publication or distri
bution
14
Intel processors have one NUMA node per processor
Notice the QPI links between each CPU
Intel Cluster On Die (COD)
Performance impact to ESXi varies up to 35%,
depending on workload, according to VMware
Controlled in BIOS; recommend OEM Default
Affects 10 cores or more
Available Haswell (v3) and later
Graphic from https://www.starwindsoftware.com/blog/numa-and-cluster-on-die
VMworld 2017 Content: Not fo
r publication or distri
bution
15
pNUMA vs vNUMA
vNUMA presents the pNUMA nodes to the virtual machine OS
Since vNUMA is software we can tune when it does not automatically match the desired configuration
Windows, Linux, and SQL server are all natively NUMA aware and have been for a very long time
vNUMA virtual NUMA presentation to a virtual machine
pNUMA NUMA architecture of the physical machine
VMworld 2017 Content: Not fo
r publication or distri
bution
16
Soft NUMA
SQL Server has a concept called soft NUMA, been around forever
Changed in SQL Server 2016; creates logical NUMA nodes up to 8 cores each
Works in conjunction with VMware vNUMA, not a substitute
SQL Server and Intel have all found 8 cores is the magic number for optimal
memory throughputVMworld 2017 Content: N
ot for publicatio
n or distribution
17
Memory
Controller
CPU
Hypervisor
APP
OS
Memory
Controller
CPU
Default, only comes into play when there are 9
vCPUs or more
If you have 4 or 6 core processors in your host and VMs with
more vCPU than cores you WILL have NUMA issues!
Consider changing the numa.min.vcpu on the virtual machine
to allow for vNUMA to take effect below this threshold
This can be set at VM level.
Introduced in vSphere 5.0, but improved in 5.5, 6.0 & 6.5
vNUMA
VMworld 2017 Content: Not fo
r publication or distri
bution
18
Test Methodology, Tools & Lab Setup
In-Guest analysis
Host Mem Usage analysis: ESXTOP (M for
memory, f to choose fields, g for NUMA fields)
.vmx file Analysis (to validate changes made via
GUI, vMotions to other hardware, impact of reboot
vs power cycle, FIRST BOOT vs others, etc…)
Worst Case Analysis – Pinning CPUs &
Memory to specific cores & nodes
01
02
03
04 VMworld 2017 Content: Not fo
r publication or distri
bution
19
• 1 x AMD Server: 2 x 16 core + 256 GB RAM
• 1 x Intel Server: 2 x 16 core + 384 GB RAM
• Tintri VMstore for storage
• SQL VMs - Win 2012 R2 + SQL 2014
• Size varied for CPU & RAM
• HammerDB
– Master/Slave config: 10 VMs @ 8 vCPU each,
– 16 virtual users per client against 24 vCPU SQL w/ 224GB RAM
Test Methodology, Tools & Lab Setup – Con’t
Lab:
VMworld 2017 Content: Not fo
r publication or distri
bution
20
• Task Manager can show you NUMA
nodes by right-clicking the graph
Determine NUMA configuration from Windows
VMworld 2017 Content: Not fo
r publication or distri
bution
21
• Task Manager can show you NUMA
nodes by right-clicking the graph
Determine NUMA configuration from Windows
VMworld 2017 Content: Not fo
r publication or distri
bution
22
• Task Manager can show you NUMA
nodes by right-clicking the graph
Determine NUMA configuration from Windows
VMworld 2017 Content: Not fo
r publication or distri
bution
23
• Resource monitor (CPU tab) shows more
detailed info about the CPUs and which NUMA
node they belong to
Determine NUMA configuration from Windows– Con’t
VMworld 2017 Content: Not fo
r publication or distri
bution
24
Check NUMA in SQL – Con’t
select * from sys.dm_os_memory_nodes
VMworld 2017 Content: Not fo
r publication or distri
bution
25
Check NUMA in SQL
select * from sys.dm_os_schedulers
VMworld 2017 Content: Not fo
r publication or distri
bution
27
Checking NUMA on Host (ESXTOP) – Con’t
VMworld 2017 Content: Not fo
r publication or distri
bution
28
• Host 2 socket 12 cores, 384 GB of memory
• Each NUMA node is 12 cores and 192 GB of memory
• VM with 12 cores and 256 GB of memory will have two NUMA nodes, 6 cores each with 128 GB of memory per node
NUMA Node Balancing
• NUMA imbalance occurs when there is a mismatch between the number
CPU and memory for a virtual machine and the physical hardware.
• Since NUMA is a collection of CPU and memory resources ensure you
are sized correctly
• Two NUMA nodes means the memory is split in half
VMware rarely makes an imbalance when it auto-configures NUMA
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
NUMA Node
VMworld 2017 Content: Not fo
r publication or distri
bution
29
NUMA Penalty
NUMA wants to
schedule the thread on
the CPU where the
memory being assigned
to the thread
Memory lookup has a
cost which is known as
the NUMA penalty
When a thread runs but
the memory it needs is
in the other NUMA node
a memory lookup occurs
VMworld 2017 Content: Not fo
r publication or distri
bution
30
In our testing (HammerDB workload), we found the penalty to be as great as a 40% drop in performance!
Penalty varies by workload
NUMA Penalty – Con’t
VMworld 2017 Content: Not fo
r publication or distri
bution
31
• 8 vCPU machine will still run, but you will lose consolidate rates
• Most SQL server virtualization consolidation is not the main goal
• For large machines having them be multiples of the number of cores runs best, 12, 24, 36 vCPUs
• Remember to leave room for the hypervisor
VM Sizing
• Example 12 core servers work best with virtual machines sized
1, 2, 3, 4, 6, or 12 vCPUs
• Size a VM to fit inside a single NUMA node for best performance
• Right size your workloads
• For best CPU scheduling size all virtual machines to be evenly
divisible by the number cores in the processor
https://www.vmware.com/techpapers/2017/Perf_Best_Practices_vSphere65.html
VMworld 2017 Content: Not fo
r publication or distri
bution
32
Virtual Nodes – 24 cores on a 16 core CPU
VMworld 2017 Content: Not fo
r publication or distri
bution
33
Cores vs Sockets
• 1 core per socket (“wide”) allows the CPU scheduler the most flexibility on
scheduling, BUT can have a negative impact when interpreted by software
• vSphere will determine the best NUMA topology for a VM on first boot. This
is set in the .VMX file
• Changing from 1 core per socket, locks in the vNUMA configuration, vSphere
cannot update it (Autosize settings are ignored)
• Use multiple cores to save on licensing for applications you pay per socket.
• If you are sure of the underlying hardware you can change these settings to
match NUMA boundaries (Recommended)
• If you desire a non-standard NUMA configuration you can change them here
• Results do vary, you need to test to validate, each workload is impacted by
NUMA differently.
http://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf
Core
VMworld 2017 Content: Not fo
r publication or distri
bution
34
• cpuid.coresPerSocket = 1 (default)
• Determines number of virtual cores per socket
Cores & Sockets – VM Settings
numa.vcpu.followcorespersocket = 0 (default)
• If set to 1, reverts to the old behaviour for virtual
NUMA node sizing being tied to
cpuid.coresPerSocket
NEW IN vSPHERE 6.5
VMworld 2017 Content: Not fo
r publication or distri
bution
35
Cores & Sockets – vSphere 6.5 “Green Line” Configurations
https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html
VMworld 2017 Content: Not fo
r publication or distri
bution
36
numa.vcpu.maxPerVirtualNode=8 (default)
• Used to span additional NUMA nodes
numa.vcpu.preferHT=False (default)
• Enable if you want to use HT cores and less NUMA nodes
numa.vcpu.min=9 (default)
• Threshold for when vNUMA will take effect
numa.autosize.once=True (default)
• Recommended: False – Change behavior to recalculate
vNUMA on every power cycle
numa.autosize=False (default) DEPRECATED (v6.5)
• Change to True to have VM recalculate vNUMA on every
power cycle - *RECOMMENDED*
VM Advanced vNUMA Settings – con’t
VMworld 2017 Content: Not fo
r publication or distri
bution
37
numa.autosize.cookie=[auto-generated value]
• What VMware calculated as your vNUMA config
• (160001) = 16 sockets, 1 core each
• numa.autosize.vcpu.maxPerVirtualNode
• = [auto-generated value]
• How many cores per NUMA nodes based on the autosize
• 8 shown in example – boundary of the host we are using (AMD 16
cores x 2 sockets)
Auto-Generated Settings – LOOK, BUT DON’T TOUCH!
NOTE: As of vSphere 6.5 (and latest patches of vSphere 6.0), these settings are no longer visible in the UI, but can still be found in the .vmx filesVMworld 2017 Content: N
ot for publicatio
n or distribution
38
VM Advanced vNUMA Settings – con’t
…..But you CAN access the VMX file via CLI or Datastore Browser!
TIP: You can’t see Advanced config settings while a VM is running….
VMworld 2017 Content: Not fo
r publication or distri
bution
39
What Does Auto-Sized NUMA Look Like?
Note: If cpuid.coresPerSocket or numa.vcpu.maxPerVirtualNode is present in a VM’s VMX file,
Autosize is ignored
numa.autosize.vcpu.maxPerVirtualNode= 12 (or 24 or 8 or ….?)
numa.autosize.cookie= 240001
VMworld 2017 Content: Not fo
r publication or distri
bution
40
• Tested on 4 NUMA node system
NUMA AutoSize
VMworld 2017 Content: Not fo
r publication or distri
bution
41
VMware Hot-Add Gotchas
• When you turn on CPU hot add, it will disable vNUMA
• Memory HotAdd works fine with one caveat
• In VMware hardware version 8-10 adding memory to a vNUMA machine
it only added to NUMA node 0
• You would then have a NUMA memory imbalance
• Requires a power cycle of the virtual machine to correct the imbalance
• Hardware versions 11+ (vSphere 6.0 +) balances the memory as it is
added
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
RA
MR
AM
RA
MR
AM
+
+
+
+
VMworld 2017 Content: Not fo
r publication or distri
bution
42
• numa.autosize TRUE
• numa.autosize.once FALSE
Update NUMA Configuration
• NUMA for a virtual machine is calculated at first power on.
• Only updates when you change the number of cores
• When a vMotion occurs between different hardware with
different underlying NUMA configuration it is not updated.
• Three scenarios:
• To force update review and/or update of NUMA topology to VM
there are two settings to add to the advanced section of the VM.
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
NUMA node size configuration is
smaller or the same to the new host,
no real change≤NUMA is smaller and is not evenly
divisible, then NUMA is basically
disabled<NUMA node size is larger but is evenly
divisible, then the NUMA node is divided
up to match however the OS will not
know the memory locality÷>
VMworld 2017 Content: Not fo
r publication or distri
bution
43
Prefer HT
• Off by default
• Host setting and VM setting
• If using, set at the VM level in nearly all cases
• Only turn it on when you have more vCPU than NUMA node size but your
memory still fits into one NUMA node.
• This will allow all threads to schedule on one processor and all memory is local
• Workloads with lots of inter-thread communication will benefit
• Mileage may vary and you should test your workload each way, the answer will
depend up the value of local memory vs having a full CPU cycle
https://blogs.vmware.com/vsphere/2014/03/perferht-use-2.html
Core
Hyper-threading: Doubling the number of processing threads per core
VMworld 2017 Content: Not fo
r publication or distri
bution
44
vNUMA Host Settings
• In nearly all cases do not TOUCH!!!
• Mostly covers when and how a host will change a VM from one NUMA node to another
• Most large virtual machines are not impacted by this as they don’t change
• Upon VM boot it is assigned a NUMA node or nodes
• If too many VMs are running on one NUMA node causing CPU pressure, ESXi will move a VM between nodes
• CPU thread move instantly, memory moves slowly
• ESXi will try to keep VMs communicating over the network with each other together for improved network speed
VMworld 2017 Content: Not fo
r publication or distri
bution
45
• Node interleaving off means NUMA is on
• Node interleaving on for SMP configurations
NUMA in BIOS
• NUMA can be turned off in the hardware BIOS
• Ensure it is enabled
• Every hardware vendor seems to call it something slightly
different
• Most have NUMA enabled by default
• “Node Interleaving” is the most common name
VMworld 2017 Content: Not fo
r publication or distri
bution
46
Before you blame NUMA….
• An important finding throughout this testing is how much impact
database optimization can have!
• More importantly, how negative NOT optimizing your database.
• HammerDB (a sample application) grinds to a crawl after
prolonged use…. Optimization can breath new life!
• In our case… 1.25 Million TPM down to <1000 TPM!!!!
• DB Size: 200 GB (2,000 warehouses) -> 245 GB -> 375 GB
(optimized)
• NUMA should be one of the last things you look at if 1 core per
socket is setVMworld 2017 Content: N
ot for publicatio
n or distribution
55
Closing Comments
When in doubt,
DON’T TOUCH IT!
Topic only applies to very
large VMs that don’t fit into
NUMA nodes and require
maximum performance
If you think you have a
handle on NUMA, that
may be even more
dangerous!VMworld 2017 Content: N
ot for publicatio
n or distribution