2012 ATLAS Technical I nterchange Meeting Annecy, France
description
Transcript of 2012 ATLAS Technical I nterchange Meeting Annecy, France
![Page 1: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/1.jpg)
2012 ATLAS Technical Interchange MeetingAnnecy, France
Stephen GrayDell Global CERN/LHC Technologist+1.512.574.5032 | [email protected]
![Page 2: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/2.jpg)
Dell LHC Program
Building a “Bulldozer” Processor
• Each processor die is composed of 4 “Bulldozer” modules
• Module divisions are transparent to shared hardware, operating system or application
• The modular architecture speeds chip development and increases product flexibility
Server:“Interlagos” –16 cores (2 dies) “Valencia” –8 cores (1 die) Client:“Zambezi” –8 cores (1 die)
Shared L3 Cache
NB/HT LinksMemory Controller
DELL/AMD CONFIDENTIAL
8MB Shared L3 Cache per die
![Page 3: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/3.jpg)
Dell LHC Program
DDR3
Romley EP Platform
Sandy Bridge CPU
Sandy Bridge CPU
Patsburg
QPIDDR3
DDR3
DDR3
MemoryDDR3 & DDR3L
RDIMMs & UDIMMs, LR DIMMs4 channels per socket, up to 3 DPC; speeds up to DDR3 1600
PCI Express* 3.040 lanes per socket
Extra Gen 2 x4 on 2nd CPU
DDR3
DDR3
DDR3
PCIe
3 x8
PCIe
3 x8
PCIe
3 x8
PatsburgOptimized Server & WS
PCHIntegrated Storage:
Up to 8 ports 6Gb/s SASRAID 5 optional
Sandy Bridge CPUsUp to 8 cores / socket
with up to 20M of cache
DM
I2
PCIe
3 x8
PCIe
3 x8
PCIe
3 x8
PCIe
2 x4
QPI2 QPI links with
bandwidth up to 8 GT/s
QPI
DDR3
PCIe
3 x8
PCIe
3 x8
PCIe
3 x8
PCIe
3 x8
DELL/Intel CONFIDENTIAL
![Page 4: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/4.jpg)
Dell LHC Program4 Confidential
All HS 06 Test Before 12/2011
6 8 12 16 32 64 6.00
106.00
206.00
306.00
406.00
506.00
606.00
79.09
101.52157.33
519.46
48.5565.20
129.61
330.25
531.50
Intel Sandy BridgeAMD Interlagos
Cores Present
HEPS
PEC0
6/sy
stem
Notes:* All tests are 32-bit, hyperthreading disabled, clock speed up enabled* Multiple tests on the same proc type are averaged * 32 core AMD is 3.0 GHz, all others are 2.3 GHz* Intel 6 & 12 core is 2.0 GHz, 8 core is 1.6 GHz, 32 core is 2.7 GHz* 64 Core Intel is a 4 x Socket 2.4GHz R820
![Page 5: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/5.jpg)
Dell LHC Program5 Confidential
Core Control
1 2 48 16 32
64
0
100
200
300
400
500
600
11 21 4386
170
337
504
00
0
101
175
358
566
21 27 51
93
161
316
536
22 4174
128
255
502
0
AMD Interlagos - Numactl BindingIntel Sandy Bridge - BIOs Core DowningIntel Sandy Bridge - Numactl Bind-ingIntel Sandy Bridge - Numactl Bind-ing HT Off
Cores/Threads
HEP
SPEC
06
Notes:- All tests used RHEL 6.2 and Gcc 4.4.5- Intel SB numbers are from an R820 with 4 x 2.4GHz 8 core engineering processors and HT enabled- AMD Interlagos numbers come from a C6145 with 4 x 6276 2.3GHz production processors
![Page 6: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/6.jpg)
Dell LHC Program6 Confidential
Very Cool Scalability
1 2 4 8 16 32 640.0
0.2
0.4
0.6
0.8
1.0
1.2
101.3% 100.4%100.7%
97.1% 97.8%
49.6%
28.6%
88.9%82.4%
73.1%
96.3%
69.6%AMD Interlagos - 4 Socket Numactl Binding
Intel Sandy Bridge - 4 Socket Numactl Binding
Cores/Threads
SPee
d U
p
Notes:- RHEL 6.2 and gcc 4.4.5 used for all tests- Sandy Brigde numbers are from an R820 with 4 sockets 2.4GHz 8 core engineering processors w/ HT en-abled- Interlagos numbers are from a C6145 with 1 tray with 4 socket Optern 6276 2.3GHz production pro-cessors
![Page 7: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/7.jpg)
Dell LHC Program7 Confidential
AMD C6145 Interlagos Map
![Page 8: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/8.jpg)
Dell LHC Program8 Confidential
Intel Sandy BridgeGet the Map Right
![Page 9: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/9.jpg)
Dell LHC Program9 Confidential
The Problem is an Old One• New x86 systems think they
are SMP• As many CPUs in 2u as an HP
SuperDome in a 42u rack (eta 2004)• One must relearn
process/thread binding
![Page 10: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/10.jpg)
Dell LHC Program10 Confidential
OS Effect On HS06
5.5/5.76.2
RHEL 7A wo tune RHEL 7A w
tune RHEL 7A w tune & avx
0
100
200
300
400
500
600
198 198 207
291 309
586 587
428
503541 548 Intel Westmere-C6100 w 2x Intel
x5670 2.66GHz 6C
Intel Sandy Bridge-R820 w 2 x Intel SB 2.4GHz 8C
AMD Interlagos - C6145 w 4 x AMD 6276 2.3GHz 16C
Operating System
HEP
SPEC
06
Notes:- R820 with RHEL 7A and GCC 4.6.2 com-piled all HEPSPEC06 benchmarks except Deal II (see whitepaper)-C6145 with RHEL 7A and GCC 4.6.2 compiled all HEPSPEC06 benchmarks ex-cept Deal II (see whitepaper)- The "w tune" designation refers to the linux64-cern.cfg file compiler flags being modified to include the -march=bdver1 for AMD's Interlagos and -march=corei7 for Intel's Sandy Bridge- No patching or tuning to the OS was made
![Page 11: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/11.jpg)
Dell LHC Program11 Confidential
Newer OSes Vs SL 5.5/5.7
Inter RHEL 7A SB RHEL 7A Inter RHEL 6.2 SB RHEL 6.2 Westmere SL 6.280.00%
100.00%
120.00%
140.00%
160.00%
180.00%
200.00%
220.00%
186%
201%
118%
106%100%
Operating System Perform...
SL or RHEL 55/57 vs
Perc
ent
Incr
ease
Notes:- No tuning was per-formed on RHEL 7A runtimes- The standard linux32_cern.cfg was used for all testing- AMD Interlagos num-bers are from a C6145 tray with 4 x AMD 6276 2.3 GHz 16 core processor- Intel Sandy Bridge numbers are from an R820 with a 2.4 GHz 8 core processor- Intel Westmere num-bers are from a C6100 tray with Intel X5650 2.66 GHz 6 core pro-cessors
![Page 12: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/12.jpg)
Dell LHC Program12 Confidential
SB R 7A w tune - 64T
Inter R7A w tune - 64C
SB R 7A wo tune - 64T
Inter R 7A wo tune - 64C
Inter R 6.2 - 64C
Inter SL 5.7 - 64C
SB R 6.2 - 64T
SB SL 5.7 - 64T
Westmere SL 5.5/6.2 - 12C
0 30 60 90 120 150 180 210
68
73
68
74
80
93
129
137
202
40,000 HS06 Target
Systems Required
Servers Required
Syst
em O
S
Notes:- The standard linux64_cern.cfg was used for SL 5.7 and RHEL 6.2- AMD Interlagos numbers are from a 4 x 2.3GHz 16c processor- Intel Sandy Bridge numbers are from a 4 x 2.4GHz 8c processors- Intel Westmere are from 2 x 2.66 GHz 6c processors- Hyper threading was enabled on all Intel testing- HS06 numbers are based on total system ores/threads- The "w tune" designation refers to the linux64-cern.cfg file compiler flags being modified to include the -march=bdver1 for AMD's Interlagos and -march=corei7 for Intel's Sandy Bridge
![Page 13: 2012 ATLAS Technical I nterchange Meeting Annecy, France](https://reader035.fdocuments.net/reader035/viewer/2022062501/568161e3550346895dd1fe95/html5/thumbnails/13.jpg)
Dell LHC Program13 Confidential
Walk A Way• Intel Sandy Bridge is Fast (Porsche GT3)
• Must learn to use Numactl to bind thread• Expensive - Intel = $18362.22, ~1000 HS06
Intel Solution: Dell PowerEdge C6220, $18.36/HS06, 8 E5-2670 2.6GHz 8C, 128GB 1600MHz (total RAM per C6220, 2GB/core), 8 500GB drives.
• Interlagos (Volkswagen GTI)• Must learn to use Numactl to bind threads• For some applications you must turn half the
cores• Cheaper -
AMD = $11011.65, ~1000 HS06 AMD Solution: Dell PowerEdge C6145, $11.01/HS06, 8 6276 2.3GHz 16C, 256GB 1600MHz (total RAM per C6145, 2GB/core), 8 500GB drives; $11011.65, ~1000 HS06
• New Operating Systems and Gcc are your friend