Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines...
Transcript of Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines...
![Page 1: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/1.jpg)
Don’t keep your CPU waitingSpeed reading for machines
www.pivotor.com [email protected]
z/OS Performance Education, Software, and
Managed Service Providers
![Page 2: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/2.jpg)
2
Agenda
• Storage vs storage
• CPU and processor caches
• Storage tiers
• Sync vs Async
• Basic cache principles
• Improving sync access
• Hiperdispatch and DAT
• Improving async access
• DSS stats
• RMF stats
• zHyperLink
![Page 3: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/3.jpg)
3
Storage vs Storage
![Page 4: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/4.jpg)
4
All roads lead to the CPU
• z14: 5.2 GHz = .192 ns cycle time
![Page 5: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/5.jpg)
5
How far can I go in .192 ns?
• Speed of light = 300,000 km / sec
• = 30 cm / ns
• 30cm ~ 12 inches * .192
= 2.3 inches
![Page 6: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/6.jpg)
6
z14 PU Chip
• 14 nm technology
• 6.14 billion transistors
• 23.2 km or 14.4 miles of wire
• Up to 10 cores
• L3 cache on chip
http://www.redbooks.ibm.com/abstracts/sg248451.html?Open
![Page 7: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/7.jpg)
Processor Caches
7
L1: On-Core
- 128 KB Instruction, 128KB Data
L2: On-Core
- 2 MB Instruction, 4MB Data
L3: 64 MB on SCM chip
One shared per 10 Cores
L4: 672 MB per drawer
Main Memory:
8 TB (multiple DIMMs) per drawer
![Page 8: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/8.jpg)
Storage Tiers
8
Core: L1/L2
L3
L4
Memory
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
Speed/$$$
Capacity/$
Sync
Async
![Page 9: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/9.jpg)
9
Caching Principles
• Temporal locality: Data that is accessed once, will likely be accessed again– Try to hold data in cache as long as possible (LRU)– Larger caches can hold more
• Spatial locality: Data near the same location as target data will likely be accessed as well– Cache lines and sequential prefetch
• CPU performance goal: Must be able to access L1 cache in a single clock cycle– L1 cache should not be too large to prevent this
![Page 10: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/10.jpg)
Improving Sync Access
10
Core:
L1/L2
L3
L4
Memory
• Measure! Are you collecting 113’s?– Cycles per instruction, L1 miss percentage, RNI
• Configuration– Small LPARs, Logical/physical CP ratio
– Hiperdispatch: avoid Vertical Low CP’s
• Use Large Frames for DB2– Fewer segment/page table entries
– Reduce TLB misses
• Data in memory tech– Reduce cycles to fetch busy tables
• Applications– Use current compilers
– Check your old assembler
![Page 11: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/11.jpg)
11
Processor Cache Measurements
• Cycles per instruction
• L1 miss percent
• Relative Nest Intensity (RNI)
![Page 12: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/12.jpg)
Improving Sync Access
12
Core:
L1/L2
L3
L4
Memory
• Configuration– Small LPARs
– Logical/physical CP ratio
– Hiperdispatch: avoid Vertical Low CP’s
![Page 13: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/13.jpg)
Pre-Hiperdispatch: “Horizontal” alignment
LPAR1 LPAR2 LPAR3 LPAR4 LPAR5 LPAR6 LPAR7
WEIGHT=300
LCP = 7
WEIGHT=100
LCP=3
WEIGHT=100
LCP=3
WEIGHT=100
LCP=3
WEIGHT=200
LCP=4
WEIGHT=100
LCP=3
WEIGHT=50
LCP=2
350/1000
=35% Share
100/1000
=10% Share
100/1000
=10% Share
100/1000
=10% Share
200/1000
=20% Share
100/1000
=10% Share
50/1000
=5% Share
35%
10%
10%
10%
20%
10%
5%
LPAR Share Percentage of CPU Pool
LPAR1
LPAR2
LPAR3
LPAR4
LPAR5
LPAR6
LPAR7
* * * * * * * * * * * * * * *
• Weight is guaranteed minimum
• May be exceeded if other LPARs do
not use their full share
• Subject to number of LCPs (also
limits maximum)
Total Weight=1000
15 physical CP’s
![Page 14: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/14.jpg)
HiperDispatch
•A method of “aligning” physical CPUs with LPARs
•Goal is to take advantage of newer hardware design
•Also reduce multiprocessing overhead (“MP Effect”)
•Default as of z/OS 1.13
HIPERDISPATCH = NO HIPERDISPATCH = YES
“Horizontal” alignment “Vertical” alignment
% share distributed across all available
physical CPUs% share distributed by physical CP
All CPs distributed among all LPARsLCPs become “Vertical High, Vertical
Medium, Vertical Low”
Utilization of all CP’s tends to be even Unused (Vertical Low) CPs are “parked”
![Page 15: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/15.jpg)
Vertical Alignment
• LPAR1 example:
• 35% share, 7 LCP’s = .35 * 15 CP (Cores) = 5.25 CPs
• 1.0 = VH, .5 - .99 = VM, <.49 = VL ***
• First every LPAR receives at least one VM = “.5 CPs”
• 5.25 - .5 = 4.75
• 4.75 - 4 (FOUR “whole CPs” = VH) = .75
• .75 - .5 = .25 (one more VM)
• .25 (less than .5) = VL
• = 4 * VH, 2* VM, 1* VL
https://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD106389
![Page 16: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/16.jpg)
PR/SM Association
• LPARs and Logical Processors
associated with drawers,
nodes, and Physical
Processors
![Page 17: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/17.jpg)
17
Hiperdispatch Config
![Page 18: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/18.jpg)
18
Parked Processors
![Page 19: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/19.jpg)
Improving Sync Access
19
Core:
L1/L2
L3
L4
Memory
• Consider Large Frames for large, long-running DB2 work– Dynamic Address Translation (DAT)
costs cycles
– Translation Lookaside Buffer (TLB)
smaller as a % of total memory
– Large (1M) and giant (2G) frames
require fewer TLB entries
– Fewer segment/page table entries
– Reduce TLB misses
![Page 20: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/20.jpg)
20
DAT Cost
• Translation Lookaside Buffer (TLB) miss percent
![Page 21: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/21.jpg)
Improving Sync Access
21
Core:
L1/L2
L3
L4
Memory
• Data in memory tech– Reduce cycles to fetch busy tables
– Improve response at costs
• Applications– Can no longer count on large CPU
clock gains
– Separate instruction/data caches
offer benefits and drawbacks
– Compilers will take care of it – use
current compilers!
– Check your old assembler
![Page 22: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/22.jpg)
Improving Async Access
22
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
• The Best I/O…– Buy enough memory to avoid paging
– SCM: flash disk on EC12/z13
– Replaced by VFM “Expanded memory” on z14
– Buffers: enough and the right type (random VSAM)
![Page 23: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/23.jpg)
23
Aux Storage Use
• Running out of auxiliary storage can lead to a very bad day…
![Page 24: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/24.jpg)
Improving Async Access
24
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
• DSS Hit Ratio– If less than 90% (most workloads) > more cache
– Law of diminishing returns applies
– Consider I/O “flavor”
• Random/sequential, read/write
– Some workloads are inherently “cache unfriendly”
![Page 25: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/25.jpg)
25
DSS Cache Read Miss %
• Healthy read hit ratios should exceed 90%
![Page 26: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/26.jpg)
26
Random Reads only
• Isolating random reads, by removing their sequential “cache friendly” brethren, gives more insight
Random reads are also referred
to as “Normal” reads.
![Page 27: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/27.jpg)
Improving Async Access
27
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
• HDD configuration– 10K, 15K, 7.5K HDD
– Consider workload access density
– SSD for cache unfriendly workloads
• DSS Auto Tiering or all flash
![Page 28: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/28.jpg)
28
Disk Activity by Type
• Ideally, the SSD’s are supporting the bulk of the workload
![Page 29: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/29.jpg)
29
Response by Disk Type
• SSD benefit is marginal here
![Page 30: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/30.jpg)
Improving Async Access
30
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
• Software striping– Storage Class, sustained data rate
– More granular
– May span storage subsystems
• Compression– zEDC ends the debate
![Page 31: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/31.jpg)
Improving Async Access (2)
31
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
• IOSQ– SuperPAV
– Old fashioned dataset placement. SMF 42
• PEND– CMR: check DSS front-end
– DB: hardware reserves
• DISCONNECT– DSS hit ratio
– SSD
• CONNECT– zHPF, channels, HA’s
![Page 32: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/32.jpg)
32
Response Time Components
• Sub-millisecond is the new standard
![Page 33: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/33.jpg)
Something new: zHyperLink
33
http://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/redp5186.html?Open
![Page 34: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/34.jpg)
34
zHyperlink Requirements
• Z14
• z/OS 2.1
• DS888x with zHyperlink feature (FC #0431)
• DB2 v11+ only (for now)
![Page 35: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/35.jpg)
35
zHyperlink Flow
• Conditions
• Why not make everything sync?
http://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/redp5186.html?Open
![Page 36: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/36.jpg)
“External memory”
36
Core: L1/L2
L3
L4
Memory
Aux (SCM/VFM)
Disk cache
SSD
HDD
Tape
Sync
Async
SYNC!
![Page 37: Don’t keep your CPU waiting - CMG...Don’t keep your CPU waiting Speed reading for machines John.Baker@EPStrategies.com z/OS Performance Education, Software, and Managed Service](https://reader030.fdocuments.net/reader030/viewer/2022041015/5ec63e293b691d5c4f492937/html5/thumbnails/37.jpg)
37