Webinar OpenVMS i2ServerPerformance
-
Upload
api-44068637 -
Category
Documents
-
view
229 -
download
0
Transcript of Webinar OpenVMS i2ServerPerformance
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
1/33
12010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Rafiq Ahamed K, Technical Expert, OpenVMS
3rd December 2010
OpenVMS V8.4 PerformanceOn New i2 Server
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
2/33
2
Agenda
New i2 Server Quick Introduction
Performance of new i2 Servers
V8.4 Performance Features and Improvements
OpenVMS Guest Performance Summary
Q & A
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
3/33
3
The performance results shared in this session are fromengineering test environment, they do not represent any
specific customer workload. Your mileage may vary.
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
4/33
4
New i2 Server Quick Introduction
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
5/33
5
Platform Evolution
All CPU hog same MC;Possible Memory
Controller Bottleneck
Added more FSB forscalability
Source: Intel Corporation
Low Latency, HighBandwidth,
Linear ScalableQPI Fabric
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
6/33
6
The BL8x0c i2 Product Family
BL860c i2 BL870c i2 BL890c i2
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
7/33
7
More
SimplyScale
Up, out and withinScale
Scale
Simplified scalability with industrys first 2, 4, 8-socket OpenVMS server blades;Now even small systems use NUMA
Combines multiple blades into a single, scalable systemIntroducing Next Generation i2 Servers
2s/8c X 2 =4s/16c X 2 = 8s/32cProcessor:
BL860C i2 BL870C i2 BL890C i2
NUMAAwareServers
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
8/33
8
BL860c i2, BL870c i2 and BL890c i2Introducing New Integrity Server Blades
BL860c i2 BL870c i2 BL890c i2
Processor Intel Itanium processor 9300 series (quad-core and dual-core*)
Processors/CoresUp to 2 Processors/8 coresUp to 2 Processors/4 cores*
Up to 4 Processors/16 cores Up to 8 Processors/32 cores
Chipset Intel Boxboro Chipset (I/O Hub)
MemoryIndustry Std. DDR3technology
24 DIMM Slots
Max:192GB (w/8GB)
Max:384GB (w/16GB*)
48 DIMM Slots
Max:384GB (w/8GB)
Max:768GB (w/16GB*)
96 DIMM Slots
Max:768GB (w/8GB)
Max:1.5TB (w/16GB*)
Internal Storage2 Hot-Plug SFF SAS HDDsHW RAID 0/1 controller(standard)
4 Hot-Plug SFF SAS HDDsHW RAID 0/1 controller(standard)
8 Hot-Plug SFF SAS HDDsHW RAID 0/1 controller(standard)
Networking (integrated) 4 x 10 GbE (Flex-10) NICs 8 x 10 GbE (Flex-10) NICs 16 x 10 GbE (Flex-10) NICs
Mezzanine Slots 3 PCIe slots 6 PCIe slots 12 PCIe slots
Management Integrity Integrated-Lights Out 3 (iLO 3 ) Advanced Pack (standard)
Density 8 server blades in c70004 server blades in c3000
4 server blades in c70002 server blades in c3000
2 server blades in c70001 server blade in c3000
* Future Support
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
9/33
9
BL8x0c i2 Blade ArchitectureIntra-Blade 19.2 GB/s, Inter-blade 57.6GB/sMemory: 28.8GB/s peak per Processor ModuleQPI, IOH to Processors: 38.4 GB/s
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
10/33
10
Memory
Dual Integrated Memory
Controllers with 4 SMI channels,peak memory band-width up to 34GB/s (6x)
Capability to supports up to 1TBmemory per IMC
1MB Directory Cache/IMC
Directory-based Cache Coherency Reduces Snoop traffic andcontention
DDR3 Higher Throughput(800MT/s), Lower Power, FasterResponse Time, IncreasedCapacity/DIMM (16GB)
Performance Features of i2 Blades
SMI
Intel Scalable Memory Interconnect(Intel SMI), connects to the Intel 7500Scalable Memory Buffer to support
larger physical memory DDR3 RDIMMsSMB supports different size and types of
DIMM
Processor
Enhanced Thread-LevelParallelism (TLP) [8T/P]
Instructions-level parallelism (ILP) minimize threads from stallingthe pipeline
Data TLB support for 8K and 16Kpages
Intel Turbo Boost Technology
Performance on Demand Intel VT-i2 is Introduced
QPI
New Intel QuickPathInterconnect Technology -replaces the Front Side Bus witha point-to-point
4 full-width Intel QuickPathInterconnect links and 2 half-width links per processor
Peak processor-to-processor andprocessor-to-I/Ocommunications up to 96 GB/s(9x)
Glueless System Designs Up toEight Sockets FSB Limitations
IO
Gen 2 supports 5GB/sec
Flex-10 Dual Ported 10GBE NICs helpsbandwidth partitioning
QPI
Itanium9300
(Tukwila-MC)
Itanium9300
(Tukwila-MC)
Intel7500IOH
(Boxboro-MC)
Intel
ICH10
PCIe Gen2 Gen1
MB
MB
MB
MB
DDR3 DIMMDDR3 DIMM
DDR3 DIMM
DDR3 DIMMDDR3 DIMM
DDR3 DIMM
MB
MB
MB
MB
DDR3 DIMMDDR3 DIMM
DDR3 DIMM
DDR3 DIMMDDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
DDR3 DIMM
PCIe Devices PCIe Devices
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
11/33
11
NUMA in i2 Servers
Socket 0 Socket 1
Socket 2 Socket 3
Scalable Blade Link
Each socket/processor has its ownmemory (local) Each Processor can access other
Processor Memory (remote) In one-blade or two-blade server,
every access at most one hop
Example: BL860c i2, BL870c i2 In four-blade server, the maximumwill be two hops
Example: BL890c i2
The new i2 Servers come with 5different memory configurations
helping customers to profile theirapplication needs accordingly Details are part of this White paper: Why
Scalable Blades - HP Integrity ServerBlades
P1
P1
P2
P2P3
P3
Blade 1
Blade 2
http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdfhttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-1295ENW.pdf -
8/8/2019 Webinar OpenVMS i2ServerPerformance
12/33
12
Key Characteristics Intel Itanium processor 9100 Intel Itanium processor 9300
Cores 2 4
Total On-Die Cache 27.5 MB 30 MB
Software Threads per Core 2 2 (w/ enhanced thread management)
System Interconnect(bandwidth per processor fora 2-socket system)
Front Side Bus Peak bandwidth per processor: 5
GB/s
Intel QuickPath Interconnect Technology Peak bandwidth: 48 GB/s (up to 9x
improvement) Enhanced RAS Enables common IOHs with next-
generation Intel Xeon processors
Memory Interconnect
(bandwidth per processor fora 2-socket system)
Front Side Bus
Peak bandwidth per processor: 5GB/s
Dual Integrated Memory Controllers
Peak bandwidth 34 GB/s (up to 6ximprovement)
Memory Capacity(4-socket system)
128-384 GB 1TB (using 16 GB RDIMMs) up to 8ximprovement
Partitioning andVirtualization
Intel VT-i Intel VT-i2
Energy Efficiency Demand Based Switching (DBS) Enhanced DBS (voltage modulation inaddition to frequency)
Intel Turbo Boost Technology Advanced CPU and Memory Thermal
Management
SMP Scalability 64-bit Virtual Addressability 50-bit Physical Addressability Home snoop coherency
64-bit Virtual Addressability 50-bit Physical Addressability Directory coherency for better
performance in large SMP configurations
Up to 8-socket Glueless systems (higherscalability with OEM chipsets)Source: Intel Corporation
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
13/33
13
Performance of new i2 Servers
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
14/33
14
Cores BL860crx2660
rx3600
rx6600
rx7640
BL870c
rx8640
Superdome
Montvale-based Integrity servers Integrity Servers based onBlade Scale Architecture
rx2800 i2
BL860c i2
BL870c i2
BL890c i2
Superdome 28 s
Superdome 232 s
Primary
Secondary
Up to 2x performance improvement per socket
Positioning the New vs. Current Servers
OpenVMS
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
15/33
15
Performance Highlights
OpenVMS running on BL8x0c i2 servers
BL8x0c i2 servers architected for high performance
Architecture provides increased number and faster cores/socket
Superior memory and interconnect technology
Memory intensive applications benefit from low latency and high bandwidth architecture
Higher IO bandwidth and throughput resulting from new IO architecture
More headroom for CPU, Memory and IO intensive workloads with improved responsetime
Upto 2x performance improvement with i2 servers running OpenVMS
Our test have shown up to 2x improvement with java, some database and web serverapplications
Oracle has shown upto 3x improvement
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
16/33
16
2xImprovement
020
40
60
80
100
120
sec
Time Taken (sec)(Less is Better)
BL860c (1.59GHz/9.0MB)
BL860c-i2 (1.73GHz/6.0MB)
0
200
400
600
800
Req/sec
Throughput( More is Better)
BL860c (1.59GHz/9.0MB)
BL860c-i2 (1.73GHz/6.0MB)
050
100
150
200
250
KB/sec
Bandwidth(More is better)
BL860c (1.59GHz/9.0MB)
BL860c-i2 (1.73GHz/6.0MB)
Apache Bench Tests on OpenVMS 8.4
Apache Performance
Configuration Details
The tests were run on OpenVMS 8.4 Apache 2.1-1 with ECO2 , Apache Bench 2.0.40-dev
Time Taken should be less; Req/sec and KB/sec should be more BL860c-i2 was able to cater 2x performance compared to BL860c
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
17/33
17
2xImprovement
Native Java Tests on OpenVMS 8.4 More is Better
Java Workload Tests
0
20000
40000
60000
80000
100000
120000
140000
8 9 10 11 12 13 14 15 16
OperationRate
Threads
Java Workload
rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB)
0
20000
40000
60000
80000
100000
120000
140000
0 2 4 6 8 10 12 14 16 18
OperationRate
Threads
Java Workload
rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB)
Java Workloads scale up better on i2 Servers
Java Workloads are high CPU and Memory Intensive
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
18/33
18
3xImprovement
Oracle 10gR2 on new i2 Server
0
5000
10000
15000
20000
16 32 48
TPM
Users
rx7640 (1.60GHz/12.0MB) BL890c i2 (1.60GHz/6.0MB)
Oracle Swing Bench Tests were run with tuning configuration (same)
rx7640 and BL890c i2 are NUMA based systems, MostlyNUMA RADEnabled and Hyper-Thread disabled, 6 RAID 5 EVA8100 Volumes
Oracle was run in Shared Server Mode
BL890c i2 server consistently shows 3x improvement for same numbersof users
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
19/33
19
0
2000
40006000
8000
10000
12000
14000
16000
rx7640 (1.60GHz/12.0MB) BL890c i2 (1.60GHz/6.0MB)
TPM
Oracle TPM for Same CPU Usage
Oracle Tests Resource Usage
3x Increase
BL890c i2 is able to drive 3x improvement for same CPU usage
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
20/33
20
Integer Tests
CPU Ratings
These numbers are per processor/socket
As the frequency increases, we see a increase in rating
CPU Bound applications should benefit (database queries), specificallyinteger computational bound applications
0
500
1000
1500
2000
2500
3000
3500
4000
Ratings
(More is Better)
9300 - BL8x0c-i2 (1.73GHz/6.0MB)
9300- BL8x0c-i2 (1.60GHz/6.0MB)
9300 - BL8x0c-i2 (1.33GHz/4.0MB)
9000 - BL860c (1.59GHz/9.0MB)
9100 - rx7640 (1.60GHz/12.0MB)
1.73 GHz/ 1.6 GHz 9300series processor show 2-2.3xperformance improvement
PerProcessor
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
21/33
21
Floating Point Computation Tests
CPU Ratings
These numbers are per Core (within a processor/socket)
Intel Itanium 9300-series processors have new high precision floatingarchitecture
Fast response to complex operations; Scientific, Automation and robotic
applications should benefit
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
FP Rating
(More is Better)
9300 - BL8x0c-i2 (1.73GHz/6.0MB)
9300 - BL8x0c-i2 (1.60GHz/6.0MB)
9300 - BL8x0c-i2 (1.33GHz/4.0MB)
9000 - BL860c (1.59GHz/9.0MB)
9100 - rx7640 (1.60GHz/12.0MB)
PerProcessor
1.73 GHz/ 1.6 GHz 9300series processor show 2.1-2.3x performanceimprovement
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
22/33
23
V8.4 Performance Preview and Features
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
23/33
24
HP Delivers Continuous Performance Improvements
OpenVMS 8.4; Delivers 10-15%Improvement
OpenVMS 8.3 OpenVMS 8.3-1H1 OpenVMS 8.4
Significant Performance EnhancementsIncorporated in each release
8.4 PerformanceEnhancements
RAD support IA64
Shadowing FeaturesCompiler ChangesFaster Cache FlushingDLM EnhancementsException Handling ChangesSMP Enhancements
RTL ChangesRMS MBC > 127
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
24/33
25
V8.4 Performance Features..
Resource Affinity Domain (RAD) support for IA64
Packet Processing Engine (PPE) Support for TCP/IP
Automatic Dynamic Processor Resilience (DPR)
Compression support for BACKUP
RMS SET MBC count support for 255 blocks
Asynchronous Virtual IO (AVIO) support for Guest OpenVMS running on HPVM
OpenVMS V8.4
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
25/33
26
V8.4 Performance Enhancements..
Shadow feature improvements to WriteBitMap, MiniCopy, MiniMerge and
SPLIT_READ_LBN Core OS improvements
Dedicated lock manager using pre-fetch
PE Driver Optimizations
Exception Handling Optimizations
Deferred SCHED AST Queuing
Changes to avoid MMG SPL contention Optimizations in Global Section Deletion and Creation Algorithms
Enabling IMS up calls for multithreaded applications
Introduced Paged Pool Look Aside List (LAL)
SYSMAN IO AUTO Performance Improvements (Fibre Only)
RTL Changes to optimize strcmp() and memcmp()
Support for new high speed USB connectivity
Compiler improvements
Miscellaneous Improvements
OpenVMS V8.4
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
26/33
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
27/33
28
OpenVMS Guest Performance Summary
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
28/33
29
OpenVMS Guest Performance
The native CPU and virtual CPU integer and floating point tests havesame rating
The memory access speed and throughput are similar to native host onsame hardware
We do see 20-40% application penalty on java workloads
Value Proposition of OpenVMS Guest on HPVM
Hardware consolidation for applications which dont rely on performanceScenario: Monolithic & distributed application development & testing Qualification on multiple OS versions Development & testing on multiple configurationsBenefits Cheaper Fewer test boxes
Faster Ready to boot or ready to use
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
29/33
30
Finally.
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
30/33
31
Up to 2Xfaster performance
Dual-coreIntegrity servers
with built-in
resiliency and less
power
consumption
Integrity server bladesbased on Blade Scale
Architecture
2- & 4-socket Integrity
Performance Enhanced with new i2 Bladesrunning OpenVMS V8.4
2.3x Integer & Floating Tests
Up to 2x Application Performance
Per socket performance increases
OpenVMS V8.4
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
31/33
32
References and Contacts
T4 & Friends was used across many benchmarking http://h71000.www7.hp.com/openvms/products/t4/
Please send across any feedback on performance [email protected]
OpenVMS 8.4 Documentation http://h71000.www7.hp.com/doc/os84_index.html
OpenVMS 8.4 New Features Documentation http://h71000.www7.hp.com/doc/84final/6679/6679pro.html
Feedback [email protected]
Business Manager Vivasvan Shastri ([email protected])
http://h71000.www7.hp.com/openvms/products/t4/mailto:[email protected]:[email protected]://h71000.www7.hp.com/doc/os84_index.htmlhttp://h71000.www7.hp.com/doc/84final/6679/6679pro.htmlmailto:[email protected]:[email protected]:[email protected]:[email protected]://h71000.www7.hp.com/doc/84final/6679/6679pro.htmlhttp://h71000.www7.hp.com/doc/84final/6679/6679pro.htmlhttp://h71000.www7.hp.com/openvms/products/t4/http://h71000.www7.hp.com/doc/os84_index.htmlmailto:[email protected]://h71000.www7.hp.com/openvms/products/t4/ -
8/8/2019 Webinar OpenVMS i2ServerPerformance
32/33
-
8/8/2019 Webinar OpenVMS i2ServerPerformance
33/33
34
Supported NUMA Configurations