Numerix CrossAsset XL and Windows HPC Server … CrossAsset XL and Windows HPC Server 2008 R2 Faster...
Transcript of Numerix CrossAsset XL and Windows HPC Server … CrossAsset XL and Windows HPC Server 2008 R2 Faster...
Numerix CrossAsset XL and Windows HPC Server 2008 R2 Faster Performance for Valuation and Risk Management in Complex Derivative Portfolios
Microsoft Corporation
Published: February 2011
Abstract
Numerix, a leading provider of cross-asset analytics for derivative portfolio valuation and
risk management, working together with Microsoft to enable Numerix CrossAsset XL with
Windows HPC Server 2008 R2 and Windows HPC Services for Excel 2010. This high-
performance computing (HPC) solution allows financial services professionals—from
traders and risk managers to insurance actuaries—to more efficiently manage their
portfolios and assess risk on an interactive and day-to-day basis because the solution
provides enhanced accuracy and more timely pricing and risk information.
This paper presents benchmark and performance test results for typical derivative
portfolio use cases. The test results show that portfolio calculation speed increased almost
linearly as more compute nodes were added to a HPC cluster. In terms of compute time,
the testing showed the following excellent results:
For a portfolio of 10,000 Foreign Exchange (FX) trades, it took 100.2 minutes to compute
results on a standalone desktop with two quad-core processors. Running it on a two-
node cluster with one head node and one compute node (two quad-core processors per
node) reduced compute time to 12.5 minutes—an 88 percent improvement.
On a nine-node cluster (two quad-core processors per node), calculation speed for
10,000 FX trades was almost 50 times faster than with a standalone desktop.
For a variable annuity Guaranteed Minimum Benefit (GMxB) policy set of 10,000, total
compute time was reduced from 139.6 minutes on a desktop to 2.6 minutes on a 72-
core HPC cluster—a 98 percent improvement.
These faster calculations save large amounts of time. Financial services professionals get the
answers they need faster, so they can respond more quickly to changing market dynamics as
they manage their derivative portfolios.
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Disclaimer
©2011 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and
views expressed in this document, including URL and other Internet Web site references, may change
without notice. You bear the risk of using it.
Some examples are for illustration only and are fictitious. No real association is intended or inferred.
This document does not provide you with any legal rights to any intellectual property in any Microsoft
product. You may copy and use this document for your internal, reference purposes.
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Table of Contents
Challenges to Managing Complex Derivative Portfolios on a Daily Basis .............................. 1
Risk Management with Numerix and Microsoft High Performance Computing ........................... 1
HPC Performance Testing for Derivative Portfolio Calculations .............................................. 2
Test Environment ......................................................................................................................... 2
Use Cases and Test Results ......................................................................................................... 3
Use Case 1: FX Trader in Numerix CrossAsset XL ..................................................................... 3
Use Case 1 Test Results .......................................................................................................... 4
Use Case 2: Variable Annuity GMxB Policy Pricing in Numerix CrossAsset XL ......................... 6
Use Case 2 Test Results .......................................................................................................... 7
Conclusion ................................................................................................................................... 10
Resources .................................................................................................................................... 11
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 1
Challenges to Managing Complex Derivative Portfolios on a Daily Basis
Managing risk and valuing complex derivative portfolios on a daily basis is challenging and
computationally intensive. Traders, risk managers, and actuaries in capital markets and insurance
must have timely and accurate information to assess value and risk. However, they are faced with
several major challenges:
Choosing between time and accuracy: Traders, risk managers, and actuaries have had to rely
on estimates and rough calculations based on stale data.
Capturing all derivative trading activity into a common framework with consistent valuations
across asset classes has been time-consuming, cumbersome, and error prone.
Running risk analysis on large, computationally intensive derivative portfolios and bespoke
deal types generally takes too long to provide information that can be used for daily risk
management.
Market uncertainty, increasing regulatory demands, and
new accounting standards (as outlined in Basel II, FAS
133/157, and IAS 39, for example) are creating increased
pressure on derivative portfolio managers to establish
accurate, timely, and consistent pricing, risk, and reporting
measures enterprise-wide.
To meet these challenges, financial services professionals need a
powerful, highly scalable solution for the pricing, valuation, and
risk management of today’s most complex derivative portfolios.
Risk Management with Numerix and Microsoft High-Performance Computing
Currently, many financial services professionals are limited in their
ability to perform valuations and assess risk by the computing
power of their individual desktops. To increase performance,
forward-looking companies are creating solutions that use high-
performance computing. Numerix and Microsoft are working
together to provide a cost-effective HPC-based solution for
managing risk and valuing derivative or variable annuity portfolios.
Numerix CrossAsset XL takes advantage of the features of the
Windows HPC Server 2008 R2 and Windows HPC Services for Excel
solution to access powerful grid-computing capabilities. These
tools from Numerix, when coupled with the value of an integrated
HPC solution from Microsoft, provide:
Rapid unified risk calculations.
Accelerated real-time valuations.
Improved systems productivity.
Interoperability and full transparency for deal definitions.
Numerix CrossAsset XL
A flexible, Microsoft Excel–based
platform for structuring, pricing, and
analyzing derivative or structured
products.
Features
Grid-computing enabled for
parallel Excel computations,
with built-in support for
Windows HPC Server 2008 R2.
Broad instrument support,
available as a complete cross-
asset solution or individual
modules.
Comprehensive library of single-
and multi-factor models and
high-performance numerical
methods.
Hundreds of templates and
examples, and a “Structuring
Wizard” for new deal types.
Simple payoff scripting language
for defining bespoke
instruments.
Full valuations and Greeks (even
for structured products).
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 2
HPC Performance Testing for Derivative Portfolio Calculations
In this paper, Numerix and Microsoft demonstrate that adding computational capacity significantly
improves performance on complex calculations for derivative portfolios. The performance testing was
designed to answer the following questions:
Does performance improve by using an HPC cluster for portfolio calculations?
How much does performance change as the number of cores increases?
What is the HPC cluster overhead?
Test Environment
A series of tests were conducted on a 12-node HPC cluster. Each node was configured identically as
shown in Table 1. Each test was run on a standalone node to establish a baseline for standalone
desktop performance. The desktop system had two quad-core processors, for a total of eight cores.
For cluster performance testing, one of the
nodes was designated as the head node,
which performed the following tasks:
Deployed copies of the test
workbook to the rest of the nodes
(called compute nodes).
Opened and closed Excel on the
compute nodes.
Hosted the master Excel
spreadsheet used for each test.
Updated the results in the master
spreadsheet.
The compute nodes handled the calculations using Excel in server mode (no GUI running) and sent the
results back to the head node.
Software used in the testing included the following:
Numerix CrossAsset XL
Windows HPC Server 2008 R2
Windows Services for Excel 2010
Microsoft Office Excel 2010
The performance tests were run two to three times to ensure that the results were consistent.
96-Core HPC Cluster Compute Node Configuration
Compute nodes 12
Processors per node 2 quad-core
Cores per node 8
Total cores in HPC cluster 96
Type of processor Intel Xeon
E5345
(8 MB Cache, 2.33 GHz)
Table 1. HPC cluster compute node configuration
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 3
Use Cases and Test Results
Two use cases, each one common to a particular industry or asset class, were tested using the HPC
cluster:
1. Foreign Exchange: Commonly traded and semi-exotic deals
(Numerix CrossAsset XL FX Trader module).
2. Insurance industry: Variable annuity GMxB policy pricing
(Numerix CrossAsset XL).
Use Case 1: FX Trader in Numerix CrossAsset XL
This use case tested HPC performance on the Numerix CrossAsset
XL FX Trader module, which provides an intuitive Excel-based
workflow interface for traders to apply sophisticated pricing and
risk models to many commonly traded and semi-exotic deals. The
CrossAsset XL FX Trader module provides all the building blocks
needed to simulate unique portfolio trades, along with pre-built
templates to get started quickly.
Sets of 1,000, 5,000, and 10,000 trades were conducted with
increasing numbers of cores. The following types of trades were
tested in this use case:
Barrier
Digital Barrier
European
Spot/Forward
Touch
The tests computed the present value (PV) of each trade, along
with the following first and second order Greeks:
Delta
Delta CCY
Gamma
Phi
Rho
Theta
Vanna
Vega
Volga
Windows HPC Server 2008 R2
An interoperable HPC solution with a
productive development
environment for organizations that
have not had access to HPC
capabilities in the past.
Features
Seamlessly scale from
workstation to cluster by making
it possible for users to harness
the power of distributed
computing through a familiar
Windows desktop environment
without requiring specialized
skills or training.
Rapidly develop HPC
applications using the Microsoft
Visual Studio development
system, which provides a
comprehensive parallel
programming environment.
Improve systems administration
and cluster interoperability by
simplifying the overall
deployment, administration,
and management over the
entire system lifetime, while
ensuring interoperability with
existing systems infrastructure.
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 4
Use Case 1 Test Results
Cluster vs. Desktop Performance
Testing showed that performance improved significantly when trade calculations were run on an HPC
cluster as opposed to a standalone desktop. As shown in Figure 1, the 10,000 trade set exhibited a
substantial increase in speed as more cores were added. With as few as 32 cores, calculation speed
was 27 times faster than on the desktop. With 72 cores, calculation speed was almost 50 times faster.
Figure 1. FX trade calculation speed up on an HPC cluster relative to a standalone desktop
From a total compute time perspective, simply
using a two-node cluster with eight cores to
calculate large trade sets resulted in dramatic
improvements in compute time. As Table 2 shows,
it took 100.2 minutes to compute the results of
10,000 FX trades on a standalone desktop with two
quad-core processors. Running the same
calculations on a two-node cluster with a head
node and one compute node, each with two quad-
core processors, took only 12.5 minutes—an 88
percent improvement.
The 5,000 and 1,000 trade sets experienced similar
Total Compute Time (minutes)
FX trade count
10,000 5,000 1,000
Desktop 100.2 48.0 9.6
Cluster w/8 cores 12.5 6.6 1.9
Improvement 88% 86% 81%
Table 2. Improvement in compute time for FX trade sets
on a two-node HPC cluster versus a desktop
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 5
improvements in compute time at 86 and 81 percent, respectively. These significant improvements in
compute time are due to Excel calculation being spread across multiple cores and multiple machines
that operate in parallel.
Cluster Performance
In all of the tests, larger calculation sets experienced much greater performance gains than smaller
sets as cores increased (Figure 2). Calculation times dropped dramatically for the 5,000 and 10,000
trade sets as up to 32 cores were added. Although smaller, the performance gains from 32 to 72 cores
were still considerable. For example, running 60 portfolios with 10,000 trades each on 72 cores versus
32 cores would save almost two hours in computing time (1.7 minutes less per portfolio), giving
financial services professionals vital information much faster.
Figure 2. HPC cluster performance gains based on total compute time in minutes for FX trades
The long, tapering performance tail shown in Figure 2 was common in all use cases. Two primary
factors explain the decrease in HPC cluster performance after 32 cores:
Bandwidth between the head node and the compute nodes
As more cores send data back to the head node, bandwidth is consumed very quickly, creating
a bottleneck for incoming data coming into the head node. Also, the head node handles
workbook deployment to the compute nodes. In some cases, the head node may still be
deploying workbooks when the first results begin to come in. This two-way traffic decreases
available bandwidth to receive results from the compute nodes. Though there are techniques
for improving overhead, such as shared memory, none were employed here.
The head node’s ability to process results coming in from the compute nodes
The master Excel spreadsheet resides on the head node. Performance may be constrained by
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 6
how quickly the incoming results can be aggregated and properly placed in the master
spreadsheet.
The reduction in compute time for large trade sets between eight and 32 cores is very positive. For
10,000 FX trades, the compute time dropped from 12.5 minutes on eight cores to 3.7 minutes on 32
cores. Similar performance gains can be extrapolated for much larger trade sets.
Cluster Overhead
As cores increased, HPC cluster overhead—activities not related to calculations—also increased.
However, larger sets of trade calculations exhibited less overhead than smaller sets (Figure 3). For
example, a small 1,000 trade set running on 72 cores spent 53 percent of the total computing time on
overhead activities. In contrast, the 10,000 trade set running on 72 cores only spent 23 percent of the
total computing time on cluster overhead. The trend shown here suggests that the most efficient use
of an HPC cluster is to run large sets of calculations on fewer cores, thus reducing overhead.
Figure 3. Percent of total computing time spent on HPC cluster overhead during FX trade calculations
Use Case 2: Variable Annuity GMxB Policy Pricing in Numerix CrossAsset XL
This use case focuses on pricing variable annuity GMxB policies. These standard insurance derivative
products have many options and must rely on pricing models and computational algorithms to create
accurate prices for the policies. The data used in this case came from the Numerix CrossAsset XL GMxB
Excel workbook, which contains millions of policies with many different variables. The data sets tested
here represent a well-defined sample of policies.
The tests calculated the present value of each policy in the set using 1,000 Monte Carlo paths. The
model used for each asset was:
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 7
Hybrid: [HW2 (USD) + Heston (SPX) + Heston (RTY) + Credit (CDX) ] + Actuarial data
(Mortality/Lapse/Withdrawal Tables)
Performance was measured as increasing numbers of cores were used to run the calculations on the
HPC cluster.
Use Case 2 Test Results
Cluster vs. Desktop Performance
As with the FX trades, larger calculation sets experienced much greater performance gains than
smaller sets when run on an HPC cluster. Compared to a standalone desktop, the 10,000 policy
calculation was 29.5 times faster on 32 cores and 53.7 times faster on 72 cores (Figure 4).
Figure 4. GMxB policy calculation speed up on an HPC cluster relative to a standalone desktop
The 1,000 policy set exhibited smaller performance gains, running 13.2 times faster on 72 cores than
on a desktop. Given the results for the 1,000 policy set, it is clear that the larger the calculation set,
the better the performance on an HPC cluster.
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 8
Total compute time for the 10,000 policy set
on a desktop was 139.6 minutes, or two
hours 19 minutes. When run on 72 cores in
an HPC cluster, it took only 2.6 minutes.
Similar performance gains happened with
the smaller policy sets (Table 3).
Cluster Performance
In all of the policy pricing tests, the larger
calculation sets experienced greater
performance gains than smaller sets as cores
increased (Figure 5), especially as up to 32
cores were added. The performance gains from 32 to 72 cores were smaller but still created important
time savings for situations where many portfolios are calculated.
Figure 5. HPC cluster performance gains based on total compute time in minutes for GMxB policies
These results are very similar to the performance gains seen in the FX trades testing, as are the smaller
performance gains above 32 cores. Improvements in compute time for large trade sets between eight
and 32 cores remained very positive for the GMxB policies: For the 10,000 policy set, compute time
dropped from 16.5 minutes on eight cores to 4.7 minutes on 32 cores. Based on these findings, similar
performance gains can be extrapolated for much larger policy sets.
Total Compute Time (Minutes)
Variable annuity policy count
10,000 5,000 1,000
Desktop 139.6 69.9 14.2
Cluster w/72 cores 2.6 1.7 1.1
Improvement 98% 98% 92%
Table 3. Improvement in compute time for GMxB policy sets on
a two-node HPC cluster versus a desktop
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 9
Cluster Overhead
The HPC cluster overhead results for GMxB pricing calculations mirrored the results from the FX trades
testing (Figure 6). The trend shown here reiterates the finding that, in terms of reducing overhead, the
most efficient use of an HPC cluster is to run large sets of calculations on fewer cores.
The increase in overhead as more cores are added can most likely be explained by the same factors
that reduced cluster performance with more cores, as explained in Use Case 1:
Bandwidth between the head node and the compute nodes to receive results.
The head node’s ability to process results coming in from the compute nodes.
Figure 6. Percent of total computing time spent on HPC cluster overhead during GMxB policy pricing calculations
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 10
Conclusion
The test results presented in this paper clearly demonstrate the value that HPC brings to complex
derivative portfolio management. Numerix and Microsoft together provide a powerful, highly scalable
solution that helps financial services professionals to more efficiently and accurately price, value, and
manage risk based on available information.
Testing also clearly shows that adding more cores to an HPC cluster significantly increases
performance:
An eight core cluster ran 10,000 FX trades in only 12.5 minutes.
Calculating the present value for 10,000 GMxB policies took only 2 minutes using 72 cores.
Using an HPC solution that is based on Numerix CrossAsset XL and Windows HPC Server 2008 R2 with
Windows HPC Services for Excel lets traders, risk managers, and actuarial professionals access the
most powerful grid computing capabilities available to the financial industry today. This also helps to
solve common challenges by:
Providing more timely and accurate information for daily risk management.
Capturing relevant derivative trading activity into a common framework with enhanced
consistency of valuations across asset classes.
Significantly reducing the time needed to run risk analyses on large, computationally intensive
derivative portfolios and bespoke deal types.
Meeting regulatory demands and new accounting standards for accurate, timely, and
consistent pricing, risk, and reporting measures enterprise-wide.
Windows HPC Server 2008 R2/Numerix Benchmark Results White Paper
Benchmark Results for Windows HPC Server 2008 R2 and Numerix CrossAsset XL 11
Resources
For more information, visit the following resources:
Numerix CrossAsset XL
http://www.numerix.com/crossasset
Windows HPC Server 2008 R2
http://www.microsoft.com/hpc/en/us/default.aspx