Search for the Sweet Spot v02

10
Search for the Sweet Spot A brief examination into which of the current Intel® Xeon® processors offer the best value, and which make the best use of power. This is a brief look at the range of Intel CPUs offered in a popular server product, the Dell R730. The R730 is a direct descendant of Dell’s 2450 server launched back in 2000, which was the company’s most popular server product in its day, offering a good mix of power, size and expandability. These 2U, 2 socket servers are able to run almost any application, and they have advanced from single-app dedicated devices to now hosting as many as 36 CPU cores for virtualization. With all of the advances in technology over the years, there remains a pertinent question for the IT engineer and consultant: “Which is the best CPU for my particular use?” To help answer that question, I’ve done some analysis of the various ways to measure the trade-offs between performance, both theoretical and measured, as well as efficiency, measured as performance per watt of power consumed. First, Intel provides a very wide assortment of eligible CPUs in this family, the E5-26XX v3 products that are the heart of the R730 design. This range of products are the result of a small number of actual designs, on the order of 3, that are then further selected into bins by testing at the end of the manufacturing process (No, Intel doesn’t try to build 38 different designs). Not all of the possible CPUs have been adopted for use in the R730 by Dell, as they need to select those processors that offer a wide range of capabilities desired by their customers without building a massively expensive test matrix. In terms of performance, Intel is rather mum these days about raw CPU performance, instead preferring to rely on system-level benchmarks provided by their customers. These preferred benchmarks include the SPEC suite of tests including SPEC CPU2006 and its sub-tests for integer and floating-point performance, and SPECvirt_sc2013 for virtualization. While these tests are obviously valuable to the target audience for this paper, it also introduces far too many variables at the system level to help with the focus here, which is strictly focused on the CPU level. I’m not advocating one should ignore these system level benchmarks, but anyone who has been closely involved in the process of submitting a system for this type of test knows that the results fall into the realm of “Lies, damn lies, and statistics.” The sources of information that I’ve used in this paper are all publically available, and those are the Dell website (dell.com) and the PassMark site (cpubenchmark.net). The Dell website provided a description of the range of offered CPUs and their basic list pricing data, while cpubenchmark.net provided the PassMark benchmark test results. I found that the two sites each listed pricing information that did not match, so all pricing used herein is from the Dell website, since they will actually sell you the product at the stated price should you so desire.

Transcript of Search for the Sweet Spot v02

Page 1: Search for the Sweet Spot v02

Search for the Sweet Spot

A brief examination into which of the current Intel® Xeon® processors

offer the best value, and which make the best use of power.

This is a brief look at the range of Intel CPUs offered in a popular server product, the Dell

R730. The R730 is a direct descendant of Dell’s 2450 server launched back in 2000, which was the

company’s most popular server product in its day, offering a good mix of power, size and

expandability. These 2U, 2 socket servers are able to run almost any application, and they have

advanced from single-app dedicated devices to now hosting as many as 36 CPU cores for

virtualization. With all of the advances in technology over the years, there remains a pertinent

question for the IT engineer and consultant: “Which is the best CPU for my particular use?”

To help answer that question, I’ve done some analysis of the various ways to measure the

trade-offs between performance, both theoretical and measured, as well as efficiency, measured as

performance per watt of power consumed.

First, Intel provides a very wide assortment of eligible CPUs in this family, the E5-26XX v3

products that are the heart of the R730 design. This range of products are the result of a small

number of actual designs, on the order of 3, that are then further selected into bins by testing at the

end of the manufacturing process (No, Intel doesn’t try to build 38 different designs). Not all of the

possible CPUs have been adopted for use in the R730 by Dell, as they need to select those

processors that offer a wide range of capabilities desired by their customers without building a

massively expensive test matrix.

In terms of performance, Intel is rather mum these days about raw CPU performance, instead

preferring to rely on system-level benchmarks provided by their customers. These preferred

benchmarks include the SPEC suite of tests including SPEC CPU2006 and its sub-tests for integer and

floating-point performance, and SPECvirt_sc2013 for virtualization. While these tests are obviously

valuable to the target audience for this paper, it also introduces far too many variables at the system

level to help with the focus here, which is strictly focused on the CPU level. I’m not advocating one

should ignore these system level benchmarks, but anyone who has been closely involved in the

process of submitting a system for this type of test knows that the results fall into the realm of “Lies,

damn lies, and statistics.”

The sources of information that I’ve used in this paper are all publically available, and those

are the Dell website (dell.com) and the PassMark site (cpubenchmark.net). The Dell website provided

a description of the range of offered CPUs and their basic list pricing data, while cpubenchmark.net

provided the PassMark benchmark test results. I found that the two sites each listed pricing

information that did not match, so all pricing used herein is from the Dell website, since they will

actually sell you the product at the stated price should you so desire.

Page 2: Search for the Sweet Spot v02

Unfortunately, PassMark information is only available for a subset of the CPUs that Dell offers.

This is readily understandable given the huge range of products available now from Intel, AMD, the

ARM providers and more. As computers have miniaturized and found their way into more and more

end-user products, like programmable deadbolts, there are now simply too many products to test

properly. The manufacturers listed above are in an arms race for their corporate survival and have

therefore decided not to rest on their laurels. As a result, there are new and improved CPUs, coming

soon to a device near you, and not all of them will be tested either.

To try to fill in the gaps of the available PassMark information, I have extrapolated some

numbers to serve as a wider basis of comparison. One of these numbers is an attempt to look at the

basics of what provides usable work inside a processor, given that all of these products, labeled by

Intel as “version 3,” share a common core design. With the assumption that most of the resulting

servers will be running a modern operating system and application set that is capable of multi-

threading, I looked at the total number of threads available for processing inside the CPU, as well as

the base clock frequency of that particular SKU. By multiplying these numbers, I have arrived at what

I call “Clock-Threads,” and the results show that this factor correlates closely with actual benchmark

performance.

Most, but not all, of these CPUs are also capable of “turbo” mode, where the clock frequency

can be momentarily increased based on their momentary overall workload and power consumption.

This is a factor that can’t be looked at much closer than “Your mileage may vary” so I have chosen to

ignore it for the use of this paper. It may, however, contribute positively to the PassMark results.

Table 1: Basic Info and Pricing (source: Dell.com on 1/17/15 for R730)

CPU Model

Full Description Added Cost

Total Cost Cores Threads Clock GHz

Nominal Power (TDP in Watts)

E5-2603 v3

Intel® Xeon® E5-2603 v3 1.6GHz,15M Cache,6.40GT/s QPI,No Turbo,No HT,6C/6T (85W) Max Mem 1600MHz [Included in Price]

baseline* $269.08 6 6 1.6 85

E5-2609 v3

Intel® Xeon® E5-2609 v3 1.9GHz,15M Cache,6.40GT/s QPI,No Turbo,No HT,6C/6T (85W) Max Mem 1600MHz [add $96.55]

$96.55 $365.63 6 6 1.9 85

E5-2620 v3

Intel® Xeon® E5-2620 v3 2.4GHz,15M Cache,8.00GT/s QPI,Turbo,HT,6C/12T (85W) Max Mem 1866MHz [add $227.18]

$ 227.18 $496.26 6 12 2.4 85

Page 3: Search for the Sweet Spot v02

E5-2623 v3

Intel® Xeon® E5-2623 v3 3.0GHz,10M Cache,8.00GT/s QPI,Turbo,HT,4C/8T (105W) Max Mem 1866MHz [add $205.89]

$205.89 $474.97 4 8 3 105

E5-2630 v3

Intel® Xeon® E5-2630 v3 2.4GHz,20M Cache,8.00GT/s QPI,Turbo,HT,8C/16T (85W) Max Mem 1866MHz [add $433.07]

$433.07 $702.15 8 16 2.4 85

E5-2630L v3

Intel® Xeon® E5-2630L v3 1.8GHz,20M Cache,8.00GT/s QPI,Turbo,HT,8C/16T (55W) Max Mem 1866MHz [add $369.18]

$369.18 $638.26 8 16 1.8 55

E5-2637 v3

Intel® Xeon® E5-2637 v3 3.5GHz,15M Cache,9.60GT/s QPI,Turbo,HT,4C/8T (135W) Max Mem 2133MHz [add $780.96]

$780.96 $1,050.04 4 8 3.5 135

E5-2640 v3

Intel® Xeon® E5-2640 v3 2.6GHz,20M Cache,8.00GT/s QPI,Turbo,HT,8C/16T (90W) Max Mem 1866MHz [add $674.46]

$674.46 $943.54 8 16 2.6 90

E5-2643 v3

Intel® Xeon® E5-2643 v3 3.4GHz,20M Cache,9.60GT/s QPI,Turbo,HT,6C/12T (135W) Max Mem 2133MHz [add $1,356.03]

$1,356.03 $1,625.11 6 12 3.4 135

E5-2650 v3

Intel® Xeon® E5-2650 v3 2.3GHz,25M Cache,9.60GT/s QPI,Turbo,HT,10C/20T (105W) Max Mem 2133MHz [add $866.15]

$866.15 $1,135.23 10 20 2.3 105

E5-2650L v3

Intel® Xeon® E5-2650L v3 1.8GHz,30M Cache,9.60GT/s QPI,Turbo,HT,12C/24T (65W) Max Mem 2133MHz [add $1,057.84]

$1,057.84 $1,326.92 12 24 1.8 65

E5-2660 v3

Intel® Xeon® E5-2660 v3 2.6GHz,25M Cache,9.60GT/s QPI,Turbo,HT,10C/20T (105W) Max Mem 2133MHz [add $1,107.54]

$1,107.54 $1,376.62 10 20 2.6 105

E5-2667 v3

Intel® Xeon® E5-2667 v3 3.2GHz,20M Cache,9.60GT/s QPI,Turbo,HT,8C/16T (135W) Max Mem 2133MHz [add $1,810.40]

$1,810.40 $2,079.48 8 16 3.2 135

Page 4: Search for the Sweet Spot v02

E5-2670 v3

Intel® Xeon® E5-2670 v3 2.3GHz,30M Cache,9.60GT/s QPI,Turbo,HT,12C/24T (120W) Max Mem 2133MHz [add $1,313.43]

$1,313.43 $1,582.51 12 24 2.3 120

E5-2680 v3

Intel® Xeon® E5-2680 v3 2.5GHz,30M Cache,9.60GT/s QPI,Turbo,HT,12C/24T (120W) Max Mem 2133MHz [add $1,476.72]

$1,476.72 $1,745.80 12 24 2.5 120

E5-2683 v3

Intel® Xeon® E5-2683 v3 2.0GHz,35M Cache,9.60GT/s QPI,Turbo,HT,14C/28T (120W) Max Mem 2133MHz [add $1,576.12]

$1,576.12 $1,845.20 14 28 2 120

E5-2687W v3

Intel® Xeon® E5-2687W v3 3.1GHz,25M Cache,9.60GT/s QPI,Turbo,HT,10C/20T (160W) Max Mem 2133MHz [add $1,782.01]

$1,782.01 $2,051.09 10 20 3.1 160

E5-2690 v3

Intel® Xeon® E5-2690 v3 2.6GHz,30M Cache,9.60GT/s QPI,Turbo,HT,12C/24T (135W) Max Mem 2133MHz [add $1,824.60]

$1,824.60 $2,093.68 12 24 2.6 135

E5-2695 v3

Intel® Xeon® E5-2695 v3 2.3GHz,35M Cache,9.60GT/s QPI,Turbo,HT,14C/28T (120W) Max Mem 2133MHz [add $2,165.39]

$2,165.39 $2,434.47 14 28 2.3 120

E5-2697 v3

Intel® Xeon® E5-2697 v3 2.6GHz,35M Cache,9.60GT/s QPI,Turbo,HT,14C/28T (145W) Max Mem 2133MHz [add $2,548.77]

$2,548.77 $2,817.85 14 28 2.6 145

E5-2698 v3

Intel® Xeon® E5-2698 v3 2.3GHz,40M Cache,9.60GT/s QPI,Turbo,HT,16C/32T (135W) Max Mem 2133MHz [add $3,280.03]

$3,280.03 $3,549.11 16 32 2.3 135

E5-2699 v3

Intel® Xeon® E5-2699 v3 2.3GHz,45M Cache,9.60GT/s QPI,Turbo,HT,18C/36T (145W) Max Mem 2133MHz [add $4,273.98]

$4,273.98 $4,543.06 18 36 2.3 145

* cost for this part is based on the cost of adding a 2nd CPU to an R730 configuration

Now it’s time to dig into some of the relative comparisons, starting with the price per core, the

calculation of clock-threads (the number of threads times the clock frequency in GHz), and then the

Page 5: Search for the Sweet Spot v02

number of clock-threads per dollar, with the clock frequency converted here to MHz to provide some

normalization of the figures.

One thing that leaps out in this analysis is the cost per core line in Chart 1 below is pretty

spikey, with 3 parts in particular standing out with high costs per core. These parts are the E5-2637

v3, the E5-2643 v3 and the E5-2667 v3. What these parts share in common is relatively high clock

rates coupled with relatively low core counts. They are intended for specialized uses, such as low-

latency trading where milliseconds are worth millions of dollars, and applications licensed on a per-

core basis, with the license fees being much higher than the cost of the CPU. In this latter case, what

makes business sense is to deliver the highest-possible clock frequencies to get the most use out of

those very expensive license fees.

A fourth, smaller spike can be seen for the E5-2687W v3, which is in the product line as Intel’s

way of saying, “Who cares about power consumption?” At 160 Watts of TDP, it is the power hog of

the group but it does offer the most cores that run at more than 3GHz. The rest of the system using

this part would need to be lightly configured to avoid melting the chassis.

Chart 1: Comparison of Price/Core, Cost of Clock-Threads and Clock-Threads/Dollar

0

50

100

150

200

250

300

Clock-Threads (Gig)

C-T (Meg)/$

$ per Core

Page 6: Search for the Sweet Spot v02

Chart 2: Comparisons of Chart 1 - Without the Parts Using High-Speed Cores

Deserving of honorable mention in the high-speed core category is the E5-2623 v3, which

stands out as a spike in Chart 2 only because the other high-speed core parts mentioned above have

been removed. It offers only 4 cores, but they are clocked at 3.0GHz to make it the gateway

processor to optimizing for fast clocks and a low number of cores.

The other points on these charts that bear examination is those that are low relative to their

neighbors on the cost per core and clock-threads scale or relatively high on the clock-threads per

dollar scale. These points could indicate relative bargains in the lineup. Of these, let’s first look at the

low-power parts, the E5-2630L v3 and the E5-2650L v3. Historically, Intel has charged a premium for

low-power consumption, so I was surprised initially to see these showing low inflection points on the

value-oriented chart. However, on closer examination they are somewhat out of position because

their cores run at a slower clock speed of 1.8GHz. They also represent small dips in the clock-threads

per dollar scale, meaning they are not great bargains in the pure performance per dollar sense.

Compared to its immediate neighbor, the E5-2630L v3 saves 30 Watts per socket, or 60 Watts per

fully-populated R730. At an average US cost of 10.55 cents per kWh for commercial customers

(http://www.eia.gov/electricity/monthly/update/end_use.cfm, published 1/26/2015) and assuming

the system runs continuously for 5 years, this would suggest savings of $277.25 in electricity costs

0

50

100

150

200

250

300

Clock-Threads (Gig)

C-T (Meg)/$

$ per Core

Page 7: Search for the Sweet Spot v02

over that lifetime of usage, or about 22% of the purchase cost of the CPUs. Experience has shown

these parts generally find weak acceptance in the marketplace, mostly because data center managers

have benefitted so greatly from collapsing the number of servers via virtualization that power savings

become a minor consideration in the purchase process. It doesn’t help the case for these parts that

the power bills in most organizations are paid by Facilities and there is no direct power meter on the

data center, so power consumption is a small consideration up to the point that power or cooling to

the entire data center is maxed out.

If you want lots of cores, Intel will be happy to oblige, and so will their shareholders. In

particular, once you get into the E5-269X v3 range of the series, the price per core starts to really

take off. Notice that the slope of the cost/core line becomes steeper than that of the clock-threads

product. These parts could still be worth consideration however, if by packing lots of cores (and lots

or RAM to go with them) into a smaller number of systems allows the total system count in a project

to go down, given that each system would most commonly get a copy of VMware and Microsoft

Server 2012 Datacenter Edition, which together can cost upwards of $10,000 to $12,000 per system.

Looking at those points on the clock-threads per dollar plot that are slightly higher than those

around them suggests CPUs that represent good value. The overall leader in this regard is the E5-

2620 v3, with 6 cores topping out at 58 clock-threads (in MHz) per dollar. Right behind it at 54.7 is

the E5-2630 v3 with 8 cores. The other part worth noting in this regard is the E5-2650 v3, which is

the lowest point in the lineup that allows full bandwidth access to the new DDR4 memory at 2133

MT/sec, as long as the third row of DIMMs in the server is not populated. For the considerations

presented so far, these three parts represent the everyday value leaders.

Chart 3: PassMark (CPU Mark) Benchmark Scores

-

5,000

10,000

15,000

20,000

25,000

30,000

PassMark Score

PassMark Score

Page 8: Search for the Sweet Spot v02

Chart 3 shows the test results for those processors of interest that have been tested

and published using the PassMark CPU Test, as of 2/10/2015 at this link:

http://www.cpubenchmark.net/high_end_cpus.html. While test results are available for a staggering

number of CPUs, not all of the parts that Dell offers for the R730 have been tested, so Chart 3 has

fewer entries than Charts 1 and 2.

The details on what this test actually does are available here:

https://www.cpubenchmark.net/cpu_test_info.html. One key point that PassMark makes about this

test suite is worth noting, which is that the benchmark software is designed to run as many

simultaneous versions of the test as the test system has logical processors, meaning that

HyperThread cores and physical cores are all exercised at the same time. Here’s a direct quote from

the PassMark site:

“To ensure that the full CPU power of a PC system is realized, PerformanceTest runs each CPU

test on all available CPUs. Specifically, PerformanceTest runs one simultaneous CPU test for every

logical CPU (Hyper-threaded); physical CPU core (dual core) or physical CPU package (multiple CPU

chips). So hypothetically if you have a PC that has two CPUs, each with dual cores that use hyper-

threading then PerformanceTest will run eight simultaneous tests.”

Chart 4: Normalized Comparison of Clock-Threads and PassMark Scores

0

0.5

1

1.5

2

2.5

3

3.5

Clock-Threads Norm.

PassMark Norm.

Page 9: Search for the Sweet Spot v02

In the comparison in Chart 4, I compare the relative results of the clock-thread calculation to

the reported PassMark results by normalizing the values to that of the E5-2620 v3. With one

exception, the results correlate closely across the spectrum, with some deviation in the higher range

of the results. This deviation is not unexpected from the standpoint that adding more and more cores

into a given package cannot give completely linear improvements in performance, as those cores are

sharing a finite number of on-chip resources used in their work, such as cache and memory

controllers. This has sometimes been called the “SMP Tax,” and is a known and expected outcome

when scaling processors inside a given system. The simplistic clock-thread calculation does not make

allowance for this effect.

The one exception noted above where the charts deviate is for the E5-2687W v3. In looking

into the results as shown in Chart 3, there appears to be something amiss with the test results on

this part. Combining the entries from Chart 3 and Table 1, we can see these relevant comparisons:

CPU Model Cores Threads Clock PassMark Score

E5-2650 v3 10 20 2.3 15660

E5-2660 v3 10 20 2.6 16472

E5-2687W v3 10 20 3.1 15848

In spite of having the same number of cores and threads as the E5-2660 v3 and a faster clock

by 500MHz faster, the part actually shows lower performance. Since these CPU tests are actually

performed in some unstated system, I believe it’s possible that whichever system was used to test

this particular part may not have been capable of supplying its full power requirements (TDP =

160W). These unexpected results have been submitted to PassMark to see if they can be resolved.

Chart 5: Performance Results per Dollar

0

10

20

30

40

50

60

70

C-T (Meg)/$

PassMark /$

Page 10: Search for the Sweet Spot v02

There are no real surprises in Chart 5, as it confirms what has already been pointed out:

1. The best value on a performance basis is at the low-end.

2. Absolute value under both measures declines regularly as you move up the product line.

3. The parts optimized for high-speed cores are bad values overall and are intended for use in

the previously-described situations, driven by the application performance needs or the cost of

software licensing.

Chart 6: Performance per Watt (TDP)

What Chart 6 shows is that, compared to TDP, the most power efficient parts are at the high

end, again with the exception of the E5-2687W, which makes sense with it being the known power

hog. The increasing efficiency as cores are added to the part comes from the fact that Intel generally

doesn’t allow TDP to increase linearly, and tries to put a cap on it so that system builders can have

an easier time of designing servers with adequate cooling in today’s compact chassis configurations.

Therefore, adding more cores into a given TDP envelope results in higher efficiency overall, even if

the cores start to miss out on their full potential performance from the SMP tax noted earlier.

Painfully missing from this chart are the two low-power parts, the E5-2630L v3 and E5-2650L

v3. They have been omitted from the chart as there are no PassMark results currently available for

them. In terms of the clock-thread calculation, they did stand out from their neighbors in the product

line, and in fact the E5-2650L showed the highest overall result at 665, comfortably better than any

other part in this lineup. I’m hoping that someone will test this part and provide the results.

Prepared by: Brian D. Allison, published on 2/12/2015. All copyrights and trademarks shown are

property of their prospective owners. This article may be shared freely with attribution to the author.

0

100

200

300

400

500

600

C-T (Meg)/W

PassMark /W