Sima Dezső 2007 őszi félév (Ver. 2.1) Dezső Sima, 2007 Többmagos Processzorok (5)
Dezső Sima 20 11 December
description
Transcript of Dezső Sima 20 11 December
Dezső Sima
2011 December
(Ver. 1.5) Sima Dezső, 2011
Platforms II.
3. Platform architectures
Contents
3.1. Design space of the basic platform architecture•
3.2. DT platforms
3. Platform architectures•
3.2.1. Design space of the basic architecture of DT platforms
3.2.2. Evolution of Intel’s home user oriented multicore DT platforms
•
•
•
3.2.3. Evolution of Intel’s business user oriented multicore DT platforms
•
3.3. DP server platforms•
3.3.1. Design space of the basic architecture of DP server platforms
•
3.3.2. Evolution of Intel’s low cost oriented multicore DP server platforms
•
3.3.3. Evolution of Intel’s performance oriented multicore DP server platforms
•
Contents
3.4. MP server platforms•
3.4.2. Evolution of Intel’s multicore MP server platforms•
3.4.3. Evolution of AMD’s multicore MP server platforms•
3.4.1. Design space of the basic architecutre of MP server platforms
•
3.1. Design space of the basic platform architecture
3.1 Design space of the basic platform architecture (1)
Platform architecture
Architecture of theprocessor subsystem
• Interpreted only for DP/MP systems• In SMPs: Specifies the interconnection of the processors and the chipset• In NUMAs: Specifies the interconnections between the processors
Specifies
MCH
ICH
P P P P
MCH
..
..
. .
ICH
P P P P
• Memory is attached to the MCH• There are serial FB-DIMM channels
Processors are connected to the MCH by individual buses
Architecture of the I/O subsystem
Specifies the structure of the I/O subsystem
(Will not be discussed)
Example: Core 2/Penryn based MP platform
MCH
..
..
. .
ICH
P P P P
The chipset consist of two partsdesignated as the MCH and the ICH
FSB
Architecture of thememory subsystem
• the point and • the layout
of the interconnection
The notion of Basic platform architecture
Platform architecture
Architecture of theprocessor subsystem
Architecture of the I/O subsystem
Architecture of thememory subsystem
Basic platform architecture
3.1 Design space of the basic platform architecture (2)
The notion of Basic platform architecture
Platform architecture
Architecture of theprocessor subsystem
Architecture of the I/O subsystem
Architecture of thememory subsystem
Basic platform architecture
3.1 Design space of the basic platform architecture (2)
SMP systems
Architecture of the processor subsystem
NUMA systems
Scheme of attaching the processors
to the rest of the platform
Scheme of interconnecting the processors
MCH
ICH
P P P P
FSB
P
..
..
..
..
..
..
PP
..
..
..
..
..
..
P
Examples
Architecture of the processor subsystem
Interpreted only for DP and MP systems.The interpretation depends on whether the multiprocessor system is an SMP or NUMA
3.1 Design space of the basic platform architecture (3)
a) Scheme of attaching the processors to the rest of the platform
(In case of SMP systems)
Scheme of attaching the processors to the rest of the platform
DP platforms MP platforms
MCH
P P P P
Memory MCH
P P P P
Memory MCH
..
P P P P
Memory
P P
Memory MCH
P P
MemoryMCH
Dual FSBsSingle FSB Dual FSBsSingle FSB Quad FSBs
3.1 Design space of the basic platform architecture (4)
b) Scheme of interconnecting the processors
(In case of NUMA systems)
PP
PP
PP
PP
Fully connected mesh
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Partially connected mesh
Scheme of interconnecting the processors
3.1 Design space of the basic platform architecture (5)
The notion of Basic platform architecture
Platform architecture
Architecture of theprocessor subsystem
Architecture of the I/O subsystem
Architecture of thememory subsystem
Basic platform architecture
3.1 Design space of the basic platform architecture (6)
Architecture of the memory subsystem (MSS)
Layout of the interconnection
Point of attachingthe MSS
Architecture of the memory subsystem (MSS)
3.1 Design space of the basic platform architecture (7)
MCH Memory
MemoryProcessor
?
Point of attaching the MSS
a) Point of attaching the MSS (Memory Subsystem) (1)
Platform
Platform
3.1 Design space of the basic platform architecture (8)
Attaching memory to the MCH (Memory Control Hub)
Point of attaching the MSS
Attaching memory to the processor(s)
Point of attaching the MSS (2)
• Longer access time (~ 20 – 70 %), • Shorter access time (~ 20 – 70 %),
• As the memory controller is on the processor die, the memory type (e.g. DDR2 or DDR3) and speed grade is bound to the processor chip design.
• As the memory controller is on the MCH die, the memory type (e.g. DDR2 or DDR3) and speed grade is not bound to the processor chip design.
3.1 Design space of the basic platform architecture (9)
Attaching memory to the MCH (Memory Control Hub)
Point of attaching the MSS
Attaching memory to the processor(s)
DT platforms DP/MP platformsDT platforms DP/MP platforms
DT Systems with off-die memory controllers
DT Systems with on-die memory controllers
Shared memory DP/MP systems
Distributed memory DP/MP systems
SMP systems
(SymmetricalMultiporocessors)
(Systems w/ non uniformmemory access)
NUMA systems
Related terminology
3.1 Design space of the basic platform architecture (10)
Attaching memory to the MCH
Point of attaching the MSS
Attaching memory to the processor(s)
MCH
ICH
FSB
Processor
MCH
ICH
FSB
Processor
Intel’s processors before Nehalem Intel’s Nehalem and subsequent processors
Memory
Memory
Example 1: Point of attaching the MSS in DT systems
DT System with off-die memory controller DT System with on-die memory controller
Examples
3.1 Design space of the basic platform architecture (11)
Intel’s processors before Nehalem
• Shared memory DP server aka Symmetrical Multiprocessor (SMP)
• Memory does not scale with the number of processors
MCH
ICH
FSB
ProcessorProcessor
Memory
• Distributed memory DP server aka System w/ non-uniform memory access (NUMA)
• Memory scales with the number of processors
Intel’s Nehalem and subsequent processors
MCH
ICH
FSB
MemoryMemory ProcessorProcessor
Attaching memory to the MCH
Point of attaching the MSS
Attaching memory to the processor(s)
Example 2: Point of attaching the MSS in SMP-based DP servers
Examples
3.1 Design space of the basic platform architecture (12)
Point of attaching the MSS
Attaching memory to the processor(s) Attaching memory to the MCH
POWER4 (2C) (2001) POWER5 (2C) (2005)and subsequent POWER families
Montecito (2C) (2006)
Opteron server lines (2C) (2003)and all subsequent AMD lines
PA-8800 (2004)PA-8900 (2005)
and all previous PA lines
Core 2 Duo line (2C) (2006)and all preceding Intel lines
Core 2 Quad line (2x2C) (2006/2007)Penryn line (2x2C) (2008)
Figure: Point of attaching the MSS
Nehalem lines (4) (2008)and all subsequent Intel lines
Examples
Tukwila (4C) (2010??)
AMD’s K7 lines (1C) (1999-2003)
UltraSPARC III (2001)and all subsequent Sun lines
UltraSPARC II (1C) (~1997)
3.1 Design space of the basic platform architecture (13)
Figure: Attaching memory via parallel channels or serial links
Layout of the interconnection
Attaching memoryvia parallel channels
Attaching memory via serial links
Data are transferred overparallel buses
Data are transferred overpoint-to-point links in form of packets
01
E.g: 16 cycles/packet on a 1-bit wide link
15
E.g: 4 cycles/packet on a 4-bit wide link
01
MC
t
MC
t
MC
t
101
100
E.g: 64 bits data + address, command andcontrol as well as clock signals in each cycle
b) Layout of the interconnection
3.1 Design space of the basic platform architecture (14)
b1) Attaching memory via parallel channels
The memory controller and the DIMMs are connected
Example 1: Attaching DIMMs via a single parallel memory channel to the memory controller that is implemented on the chipset [45]
3.1 Design space of the basic platform architecture (15)
• by a single parallel memory channel• or a few number of memory channels
to synchron DIMMs, such as SDRAM, DDR, DDR2 or DDR3 DIMMs.
Example 2: Attaching DIMMs via 3 parallel memory channels to memory controllers implemented on the processor die
(This is actually Intel’s the Tylersburg DP platform, aimed at the Nehalem-EP processor, used for up to 6 cores) [46]
3.1 Design space of the basic platform architecture (16)
The number of lines needed depend on the kind of the memory modules, as indicated below:
SDRAM
DDR
DDR2
DDR3
168-pin
184-pin
240- pin
240-pin
All these DIMM modules provide an 8-byte wide datapath and optionally ECC and registering.
The number of lines of the parallel channels
3.1 Design space of the basic platform architecture (17)
Attaching memory via serial links
Serial links attach FB-DIMMs
..
..
..
Serial
link
Serial
link
..
FB-DIMMs provide buffering and S/P conversion
Proc./MCH
Serial links attach S/P converters w/ parallel channels
Proc./MCH
S/P
..
..
..
S/P
..
..
..
Serial
link
Serial
link
..
3.1 Design space of the basic platform architecture (18)
b2) Attaching memory via serial links
Serial memory links are point-to-point interconnects that use differential signaling.
65 nm Pentium 4 Prescott DP (2x1C)/
Core2 (2C/2*2C)
E5000 MCH
631*ESB/632*ESB IOH
FSB
65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)
FB-DIMMw/DDR2-533
Xeon 5000(Dempsey)
2x1C
Xeon 5100(Woodcrest)
2C
Xeon 5300(Clowertown)
2x2C
/ /
ESI
ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,
providing 1 GB/s transfer rate in each direction)
Xeon 5400(Harpertown)
2x2C
Xeon 5200(Harpertown)
2C
// /
Example 1: FB-DIMM links in Intel’s Bensley DP platform aimed at Core 2 processors-1
3.1 Design space of the basic platform architecture (19)
Example 2: SMI links in Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-1
3.1 Design space of the basic platform architecture (20)
Nehalem-EX (8C) Westmere-EX
(10C)
QPI
QPI
DDR3-1067
SMB
SMB
SMB
SMB
ICH10
ESI
DDR3-1067
SMB
SMB
SMB
SMB
7500 IOH
QPI
Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)
or
Xeon 6500(Nehalem-EX)
(Becton)
Xeon E7-2800(Westmere-EX)
ME
SMI: Serial link between the processor and the SMB
SMB: Scalable Memory Buffer with Parallel/serial conversion
SMI links SMI links
Nehalem-EX (8C) Westmere-EX
(10C)
• The SMI interface builds on the Fully Buffered DIMM architecture with a few protocol changes, such as those intended to support DDR3 memory devices. • It has the same layout as FB-DIMM links (14 outbound and 10 inbound differential lanes as well as a few clock and control lanes).
• It needs altogether about 50 PC trails.
Example 2: The SMI link of Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-2 [26]
3.1 Design space of the basic platform architecture (21)
SMB
..
. .
Attaching memoryvia parallel channels
Layout of the interconnection
Attaching memory via serial links
Serial links attach S/P-converters w/ par. channels
Att
ach
ing
mem
ory
to t
he p
rocessor(
s)
Poin
t of
att
ach
ing
mem
ory
Att
ach
ing
mem
ory
to
th
e M
CH
P
S/P
..
..
S/P
..
..
..
S/P
..
..
.. ..
S/P
..
... .P
. .
MCH
..
S/P
..
..
..
S/P
..
... .MCH
PP
..
..
..
..
..
..
..
..
..
. .
Serial links attach FB-DIMMs
MCH
PP
..
..
. .
..
..
. . . .
..
..
. .
Parallel channels attachDIMMs
Design space of the architecture of the MSS
3.1 Design space of the basic platform architecture (22)
Subsequent fields from left to right and from top to down of the design space of the architecture of MSS allow to implement an increasing number of memory channels (nM), as discussed in Section 4.2.5 and indicated in the next figure.
Max. number of memory channels that can be implemented while using particular design options of the MSS
3.1 Design space of the basic platform architecture (23)
..
. .
Attaching memoryvia parallel channels
Layout of the interconnection
Attaching memory via serial links
Serial links attach S/P-converters w/ par. channels
Att
ach
ing
mem
ory
to t
he p
rocessor(
s)
Poin
t of
att
ach
ing
mem
ory
Att
ach
ing
mem
ory
to
th
e M
CH
P
S/P
..
..
S/P
..
..
..
S/P
..
..
.. ..
S/P
..
... .P
. .
MCH
..
S/P
..
..
..
S/P
..
... .MCH
PP
..
..
..
..
....
..
..
..
. .
Serial links attach FB-DIMMs
MCH
PP
..
..
. .
..
..
. . . .
..
..
. .
nC
Parallel channels attachDIMMs
Design space of the architecture of the MSS
3.1 Design space of the basic platform architecture (24)
The design space of the basic platform architecture-1
Platform architecture
Architecture of theprocessor subsystem
Architecture of the I/O subsystem
Architecture of thememory subsystem
Basic platform architecture
3.1 Design space of the basic platform architecture (25)
The design space of the basic platform architectures-2
Obtained as the combinations of the options available for the main aspects discussed.
Basic platform architecture
Architecture of the processor subsystem
Scheme of attaching the processors
(In case of SMP systems)
Scheme of interconnectingthe processors
(In case of NUMA systems)
Architecture of the memory subsystem (MSS)
Layout of theinterconnection
Point of attachingthe MSS
3.1 Design space of the basic platform architecture (26)
Design space of the basic architecture of particular platforms
Design space of the basic architecture of
DT platforms
Design space of the basic architecture of
DP server platforms
Design space of the basic architecture of MP server platforms
The design space of the basic platform architecture of DT, DP and MP platforms will be discussed subsequently in the Sections 3.2.1, 3.3.1 and 3.4.1.
3.1 Design space of the basic platform architecture (27)
3.2. DT platforms
3.2.1. Design space of the basic architecture of DT platforms
3.2.2. Evolution of Intel’s home user oriented multicore DT platforms
•
•
3.2.3. Evolution of Intel’s business user oriented multicore DT platforms
•
3.2 DT platforms
3.2 DT platforms
3.2.1 Design space of the basic architecture of DT platforms
3.2.1 Design space of the basic architecture of DT platforms (1)
MCH
..
..
. .
ICH
P
P
..
..
. .
..
..
. .
Layou
t of
the in
terc
on
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
al lin
ks
att
ach
FB
-DiM
Ms
Att
ach
ing
mem
ory
via
para
llel ch
an
nels
Point of attaching the MSS
MCH
..
..
..
P
ICH
P
..
..
..
....
..
Attaching memory to the MCH Attaching memory to the processor
Pentium D/EE to Penryn (Up to 4C) 1. G. Nehalem to Sandy Bridge (Up to 6C)
DT platforms
No. of mem. channels
..
S/P
..
..
..
S/P
..
... .MCH
P
ICH
..
S/P
..
..
S/P
..
..
..
S/P
..
..
.. ..
S/P..
... .P
. .
Seri
al lin
ks a
ttach
.S
/P c
on
v.
w/
par.
ch
an
.P
ara
llel ch
an
nels
att
ach
DIM
Ms
Layou
t of
the in
terc
on
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
a l lin
ks a
ttach
FB
-DIM
Ms
Att
ach
ing
mem
ory
via
para
llel
ch
an
nels
Seri
al lin
ks a
ttach
S/P
con
vert
ers
w
/ p
ar.
ch
an
nels
Pentium D/EE 2x1C (2005/6)Core 2 2C (2006)
Core 2 Quad 2x2C (2007)Penryn 2C/2x2C (2008)
1. G. Nehalem 4C (2008)Westmere-EP 6C (2010)2. G. Nehalem 4C (2009)
Westmere-EP 2C+G (2010)Sandy Bridge 2C/4C+G (2011)
Sandy Bridge-E 6C (2011)
Attaching memory to the MCH Attaching memory to the processor
Point of attaching the MSSEvolution of Intel’s DT platforms (Overview)
No. of memory channels
No.
of
mem
ory
ch
an
nels
No need for higher memory bandwidththrough serial memory interconnection
Para
llel ch
an
nels
att
ach
DIM
Ms
3.2.1 Design space of the basic architecture of DT platforms (2)
Up to DDR2-667
2/4 DDR2 DIMMsup to 4 ranks
Pentium D/Pentium EE
(2x1C)
945/955X/975X MCH
ICH7
FSB
DMI
Core2 2CCore 2 Quad (2x2C)/Penryn (2C/2*2C)
965/3-/4- Series
MCH
ICH8/9/10
FSB
DMI
Up to DDR2-800Up to DDR3-1067
X58 IOH
ICH10
QPI
DMI
1. gen. Nehalem (4C)/
Westmere-EP (6C)
Up to DDR3-1067
Tylersburg (2008)Anchor Creek (2005)
Bridge Creek (2006) (Core 2 aimed)Salt Creek (2007) (Core 2 Quad aimed)Boulder Creek (2008) (Penryn aimed)
3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (1)
3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (1)
X58 IOH
ICH10
QPI
DMI
1. gen. Nehalem (4C)/
Westmere-EP (6C)
2. gen. Nehalem (4C)/Westmere-EP (2C+G)
5- Series
PCH
FDI DMI
Sandy Bridge (4C+G)
6- Series
PCH
FDI DMI2
Up to DDR3-1067
Up to DDR3-1333
Sugar Bay (2011)
Up to DDR3-1333
Tylersburg (2008) Kings Creek (2009)
3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (2)
3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (2)
X58 IOH
ICH10
QPI
DMI
1. gen. Nehalem (4C)/
Westmere-EP (6C)
Up to DDR3-1067
Tylersburg (2008)
Up to DDR3-1600
Waimea Bay (2011)
X79
PCH
DMI2
DDR3-1600: up to 1 DIMM per channelDDR3-1333: up to 2 DIMMs per channel
Sandy Bridge-E (4C)/6C)
3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (3)
3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (3)
Up to DDR2-667
2/4 DDR2 DIMMsup to 4 ranks
Pentium D/Pentium EE
(2x1C)
945/955X/975X MCH
ICH7
FSB
DMI
Core2 (2C)Core 2 Quad (2x2C)Penryn (2C/2*2C)
Q965/Q35/Q45
MCH
ICH8/9/10
FSB
Up to DDR2-800Up to DDR3-1067
Up to DDR3-1333
Piketon (2009)Lyndon (2005)
Averill Creek (2006) (Core 2 aimed)Weybridge (2007) (Core 2 Quad aimed)McCreary (2008) (Penryn aimed)
82573E GbE(Tekoe)
Gigabit Ethernet LAN connection
LCI
82566/82567 LAN PHY
LCI/GLCI
Gigabit EthernetLAN connection
82578GbE LAN PHY
PCIe 2.0/SMbus 2.0
Gigabit Ethernet LAN connection
DMI C-link
ME
MEQ57 PCH
ME
2. gen. Nehalem (4C)Westmere-EP (2C+G)
FDI DMI
3.2.3 Evolution of Intel’s business user oriented multicore DT platforms (1)
3.2.3 Evolution of Intel’s business user oriented multicore DT platforms (1)
Sugar Bay (2011)
Up to DDR3-1333
82578GbE LAN PHY
PCIe 2.0/SMbus 2.0
Gigabit Ethernet LAN connection
Q57 PCHME
Piketon (2009)
2. gen. Nehalem (4C)Westmere-EP (2C+G)
GbE LAN
PCIe 2.0/SMbus 2.0
Gigabit Ethernet LAN connection
Q67 PCHME
Sandy Bridge (4C+G)
FDI DMI FDI DMI2
Up to DDR3-1333
3.2.3 Evolution of Intel’s business user oriented multicore DT platforms (2)
3.2.3 Evolution of Intel’s business user oriented multicore DT platforms (2)
3.3. DP server platforms
3.3.1. Design space of the basic architecture of DP server platforms
•
3.3.2. Evolution of Intel’s low cost oriented multicore DP server platforms
•
3.3.3. Evolution of Intel’s performance oriented multicore DP server platforms
•
3.3 DP server platforms
3.3 DP server platforms
3.3.1 Design space of the basic architecture of DP server platforms
MCH
..
..
..
P P
MCH
..
..
..
P P
ICH ICH
MCH
..
..
. .
ICH
P P
MCH
..
..
. .
ICH
P P
3.3.1 Design space of the basic architecture of DP server platforms (1)
Single FSB Dual FSBs
90 nm Pentium 4 DP 2x1C (2005)
Core 2/Penryn 2C/2x2C (2006/7)
65 nm Pentium 4 DP 2x1CCore 2/Penryn 2C/2x2C (2006/7)
PP
..
..
..
....
..
..
S/P
..
..
..
S/P
..
... .MCH
P P
ICH
..
S/P
..
....
S/P
..
... .MCH
ICH
P P ..
P
S/P
..
..
S/P
..
..
..
S/P
..
..
.. ..
S/P..
... .P
. .
PP
..
..
. .
..
..
. .
NUMA
Nehalem-EX/Westmere-EX8C/10C (2010/11)
Nehalem-EP to Sandy Bridge -EP/EN Up to 8 C (2009/11)
Layou
t of
the in
terc
on
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
al lin
ks
att
ach
FB
-DiM
Ms A
ttach
ing
mem
ory
via
para
llel ch
an
nels
Seri
al lin
ks a
ttach
.S
/P c
on
v.
w/
par.
ch
an
.P
ara
llel ch
an
nels
att
ach
DIM
Ms
nM
DP platforms
Single FSB Dual FSBs
90 nm Pentium 4 DP 2x1C (2006)
Core 2 2C/Core 2 Quad 2x2C/
Penryn 2C/2x2C (2006/2007)
SMP NUMA
Nehalem-EP 4C (2009)Westmere-EP 6C (2010)
No. of memory channels
(Paxville DP)
(Cranberry Lake) (Tylersburg-EP)
Nehalem-EX/Westmere-EX
8C/10C (2010/2011)
(Boxboro-EX)
65 nm Pentium 4 DP2x1C
Core 2 2CCore 2 Quad 2x2CPenryn 2C/2x2C
(2006/2007)
(Bensley)
Eff. Eff.
HP
HP
No.
of
mem
ory
ch
an
nels
Sandy Bridge-EN 8C (2011) Romley-EN
Sandy Bridge-EP 8C (20 11) Romley-EP
Layou
t of
the in
terc
on
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
al lin
ks
att
ach
FB
-DiM
Ms
Att
ach
ing
mem
ory
via
para
llel ch
an
nels
Seri
al lin
ks a
ttach
.S
/P c
on
vert
ers
w/
par.
ch
an
.
Para
llel ch
an
nels
att
ach
DIM
Ms
nM
HP
HP
Scheme of attaching and interconnecting DP processorsEvolution of Intel’s DP platforms (Overview)
3.3.1 Design space of the basic architecture of DP server platforms (2)
3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (1)
3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms
90 nm Pentium 4Prescott DP (2C)
E7520 MCH
ICH5R/6300ESB IOH
FSB
HI 1.5
90 nm Pentium 4Prescott DP (2C)
FSB
DDR-266/333DDR2-400
90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C)
Xeon DP 2.8 /Paxville DP)
HI 1.5 (Hub Interface 1.5)8 bit wide, 66 MHz clock, QDR,
66 MB/s peak transfer rate
Core 2 (2C/Core 2 Quad (2x2C)/
Penryn (2C/2x2C)
E5100 MCH
ICHR9
FSB
ESI
Core 2 (2C/Core 2 Quad (2x2C)//Penryn (2C/2x2C)
DDR2-533/667
Penryn aimed Cranberry Lake DP server platform (for up to 4 C)
Xeon 5300(Clowertown)
2x2C
Xeon 5400(Harpertown)
4C
Xeon 5200(Harpertown)
2C
or or
ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,
providing 1 GB/s transfer rate in each direction)
Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Penryn aimed Cranberry Lake DP platform (up to 4 cores)
3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (2)
Sandy Bridge-EN (Socket B2) aimed Romley-EN DP server platform
(for up to 8 cores)
Sandy Bridge-EN (8C)
Socket B2
C600 PCH
DMI2
QPI Sandy Bridge-EN (8C)
Socket B2
DDR3-1600
Penryn aimed Cranberry Lake DP platform (for up to 4 C)
Core 2 (2C/2x2C)/ Penryn (2C/4C)
proc.
E5100 MCH
ICHR9
FSB
ESI
Core 2 (2C/2x2C)/ Penryn (2C/4C)
proc.
DDR2-533/667
Xeon 5300(Clowertown)
2x2C
Xeon 5400(Harpertown)
4C
Xeon 5200(Harpertown)
2C
or or
E5-2400Sandy Bridge–EN 8C
Evolution from the Penryn aimed Cranberry Lake DP platform (up to 4 cores) to the Sandy Bridge-EP aimed Romley-EP DP platform (up to 8 cores)
3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (3)
3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (1)
3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms
90 nm Pentium 4Prescott DP (2x1C)
E7520 MCH
ICH5R/6300ESB IOH
FSB
HI 1.5
90 nm Pentium 4Prescott DP (2x1C)
FSB
DDR-266/333DDR2-400
65 nm Pentium 4 Prescott DP (2x1C)/
Core2 (2C/2*2C)
E5000 MCH
631*ESB/632*ESB IOH
FSB
65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)
FB-DIMMw/DDR2-533
Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Core 2 aimed Bensley DP platform (up to 4 cores)
90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C)
Core 2 aimed Bensley DP server platform (for up to 4 C)
Xeon DP 2.8 /Paxville DP)
Xeon 5000(Dempsey)
2x1C
Xeon 5100(Woodcrest)
2C
Xeon 5300(Clowertown)
2x2C
/ /
ESI
ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,
providing 1 GB/s transfer rate in each direction)
HI 1.5 (Hub Interface 1.5)8 bit wide, 66 MHz clock, QDR,
66 MB/s peak transfer rate
Xeon 5400(Harpertown)
2x2C
Xeon 5200(Harpertown)
2C
// /
3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (2)
FBDIMMw/DDR2-533
65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)
5000 MCH
631*ESB/632*ESB IOH
FSB
65 nm Pentium 4 Prescott DP (2C)/Core2 (2C/2*2C)
ESI
65 nm Core 2 aimed high performance Bensley DP server platform (for up to 4 C)
1First chipset with PCI 2.0ME: Management Engine
Nehalem-EP aimed Tylersburg-EP DP server platformwith dual IOHs (for up to 6 C)
DDR3-1333
Nehalem-EP (4C) Westmere-EP (6C)
55xx IOH1
QPI
QPI
ICH9/ICH10
ESI
QPI
CLink
DDR3-1333ME
Nehalem-EP (4C) Westmere-EP (6C)
DDR3-1333
Nehalem-EP (4C) Westmere-EP (6C)
QPI
QPI
ICH9/ICH10
ESI
QPI
CLink
DDR3-1333
ME
Nehalem-EP (4C) Westmere-EP (6C)
55xx IOH1QPI
55xx IOH1
ME
Nehalem-EP aimed Tylersburg-EP DP server platformwith a single IOH (for up to 6 C)
Evolution from the Core 2 aimed Bensley DP platform (up to 4 cores)
to the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores)
3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (3)
Basic system architecture of the Sandy Bridge-EN and -EP aimed Romley-EN and –EP DP server platforms
Nehalem –EP (4C)Westmere-EP (6C)
34xx PCH
DMI
QPI Nehalem-EP (4C)Westmere-EP (6C)
DDR3-1333DDR3-1333ME
Xeon 55xx(Gainestown)
Xeon 56xx(Gulftown)/
Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores)
Sandy Bridge-EP (Socket R) aimed Romley-EP DP server platform (for up to 8 cores) (LGA 2011)
C600 PCH
DMI2
QPI 1.1
DDR3-1600
QPI 1.1
Sandy Bridge-EP (8C)
Socket R
Sandy Bridge-EP (8C)
Socket R
E5-2600Sandy Bridge–EP 8C
E5-2600Sandy Bridge-EP 8C
3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (4)
Nehalem-EX (8C) Westmere-EX
(10C)
QPI
QPI
DDR3-1067
SMB
SMB
SMB
SMB
ICH10
ESI
DDR3-1067
SMB
SMB
SMB
SMB
7500 IOH
QPI
Nehalem –EP (4C)Westmere-EP (6C)
34xx PCH
ESI
QPI
DDR3-1333DDR3-1333ME
Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)
Xeon 5500(Gainestown)
Xeon 5600(Gulftown)or
Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores)
Nehalem-EX (8C) Westmere-EX
(10C)
or
Xeon 6500(Nehalem-EX)
(Becton)
Xeon E7-2800(Westmere-EX)
ME
Contrasting the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores) to the Nehalem-EX aimed very high performance scalable Boxboro-EX DP platform (up to 10 cores)
Nehalem –EP (4C)Westmere-EP (6C)
SMI: Serial link between the processor and the SMB
SMB: Scalable Memory Buffer with Parallel/serial conversion
SMI links SMI links
3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (5)
3.4. MP server platforms
3.4.2. Evolution of Intel’s multicore MP server platforms•
3.4.3. Evolution of AMD’s multicore MP server platforms•
3.4.1. Design space of the basic architecture of MP server platforms
•
3.4 MP server platforms
3.4 MP server platforms
3.4.1 Design space of the basic architecture of MP server platforms
..
S/P
..
..
..
S/P
..
... .MCH
MCH
..
..
..
P P P P
MCH
..
..
..
P P P P
ICH ICH
MCH
..
..
..P P P P
ICH
P P P P
ICH
..
S/P
....
..
S/P
..
... .MCH
ICH
P P P P
MCH
..
..
. .
..
S/P
..
..
..
S/P
..
... .MCH
ICH
P P P P
ICH
P P P P
MCH
..
..
. .
ICH
P P P P
MCH
..
..
. .
ICH
P P P P
3.4.1 Design space of the basic architecture of MP server platforms (1)
MP SMP platforms
Single FSB Dual FSBs Quad FSBs
Pentium 4 MP 1C (2004)
90 nm Pentium 4 MP 2x1C
Core 2/Penryn up to 6C
Layou
t of
the
inte
rcon
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
al lin
ks
att
ach
FB
-DiM
Ms
Att
ach
ing
mem
ory
via
para
llel ch
an
nels
Seri
al lin
ks a
ttach
.S
/P c
on
v.
w/
par.
ch
an
.P
ara
llel ch
an
nels
att
ach
DIM
Ms
PP
..
..
..
..
..
..
PP
..
..
..
..
..
..
PP
..
..
..
..
..
..
PP
..
..
..
..
..
..
PP
..
..
. .
..
..
. .
PP
..
..
. .
..
..
. .
PP
..
..
. .
..
..
. .
PP
..
..
. .
..
..
. .
..
P
S/P
..
..S/P
..
..
..
S/P..
..
.. ..
S/P..
... .
P
. ...
P... .P
..S/P
..
S/P..
..
S/P
....
..
S/P..
..
..
..
P
S/P
..
..
S/P
..
..
..
S/P..
..
.. ..
S/P..
... .
P
. .
..
P... .P
..S/P
..
S/P..
..
S/P
..
..
..
S/P..
..
... .
. .
MP NUMAplatforms
Partially connected mesh Fully connected mesh
AMD Direct Connect Architecture 1.0 (2003) AMD Direct Connect Architecture 2.0 (2010)
Nehalem-EX/Westmere up to 10C (2010/11)Inter proc. BW
Mem
. B
W
Layou
t of
the
inte
rcon
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
al lin
ks
att
ach
FB
-DiM
Ms
Att
ach
ing
mem
ory
via
para
llel ch
an
nels
Seri
al lin
ks a
ttach
.S
/P c
on
v.
w/
par.
ch
an
.P
ara
llel ch
an
nels
att
ach
DIM
Ms
3.4.1 Design space of the basic architecture of MP server platforms (2)
Single FSB Dual FSBs Quad FSBs
Pentium 4 MP 1C (2004)
(Not named)
90 nm Pentium 4 MP 2x1C (2006)
(Truland)
Core 2/Penryn up to 6C
(2006/2007)Caneland
Part. conn.mesh
Fully conn.mesh
SMP NUMA
Scheme of attaching and interconnecting MP processors
Nehalem-EX/Westmere up to 10C (2010/11)
(Boxboro-EX)
AMD DCA 1.0 (2003)
AMD DCA 2.0 (2010)
No. of memory channels
No.
of
mem
ory
ch
an
nels
Layou
t of
the in
terc
on
necti
on
Att
ach
ing
mem
ory
via
seri
al lin
ks
Seri
al lin
ks
att
ach
FB
-DiM
Ms
Att
ach
ing
mem
ory
via
para
llel ch
an
nels
Seri
al lin
ks a
ttach
.S
/P c
on
vert
ers
w/
par.
ch
an
.
Para
llel ch
an
nels
att
ach
DIM
Ms
Interproc. bandwidth
Evolution of Intel’s MP platforms (Overview)
3.4.1 Design space of the basic architecture of MP server platforms (3)
3.4.2 Evolution of Intel’s multicore MP server platforms (1)
3.4.2 Evolution of Intel’s multicore MP server platforms
Xeon MP1
SCXeon MP1
SC
FSB
Xeon MP1
SCXeon MP1
SC
Preceding ICH
Preceding NBs
E.g. HI 1.5
HI 1.5 266 MB/s
E.g. DDR-200/266 E.g. DDR-200/266
85001/8501
ICH5
XMB
XMB
DDR-266/333DDR2-400
FSB
XMB
XMB
HI 1.5
DDR-266/333DDR2-400
Xeon 7000(Paxville MP) 2x1C
Xeon 7100(Tulsa) 2C
Xeon MP(Potomac) 1C
/ /
90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C)
Pentium 4Xeon MP 1C/2x1C
Pentium 4Xeon MP 1C/2x1C
Pentium 4Xeon MP 1C/2x1C
Pentium 4Xeon MP 1C/2x1C
Previous Pentium 4 MP aimedMP server platform (for single core processors)
Evolution from the first generation MP servers supporting SC processors to the 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (supporting up to 2 cores)
3.4.2 Evolution of Intel’s multicore MP server platforms (2)
Core 2 (2C/2x2C)
Penryn (6C)
Core 2 (2C/2x2C)
Penryn (6C)
Core 2 (2C/2x2C)
Penryn (6C)
Core 2 (2C/2x2C)
Penryn (6C)
7300
631xESB/632xESB
4 channelsup to
8 DIMMs/channel
85001/8501
ICH5
XMB
XMB
DDR-266/333DDR2-400
FSB
XMB
XMB
ESI
FSB
HI 1.5
DDR-266/333DDR2-400
FB-DIMMDDR2-533/667
Xeon 7000(Paxville MP) 2x1C
Xeon 7100(Tulsa) 2C
Xeon MP(Potomac) 1C
/ /
1 The E8500 MCH supports an FSB of 667 MT/s and consequently only the SC Xeon MP (Potomac)
Xeon 7200(Tigerton DC) 1x2C
Xeon 7300(Tigerton QC) 2x2C
Xeon 7400(Dunnington 6C)
/ /
90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C)
Core 2 aimed Caneland MP server platform (for up to 6 C)
ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface,
providing 1 GB/s transfer rate in each direction)
HI 1.5 (Hub Interface 1.5)8 bit wide, 66 MHz clock, QDR,
266 MB/s peak transfer rate
Pentium 4Xeon MP 1C/2x1C
Pentium 4Xeon MP 1C/2x1C
Pentium 4Xeon MP 1C/2x1C
Pentium 4Xeon MP 1C/2x1C
Evolution from the 90 nm Pentium 4 Prescott MP aimed Truland MP platform (up to 2 cores) to the Core 2 aimed Caneland MP platform (up to 6 cores)
3.4.2 Evolution of Intel’s multicore MP server platforms (3)
Nehalem-EX 8CWestmere-EX
10C
7500 IOH
QPI
QPI
QPIQPI QPI QPI
QPIQPI
SMB
SMB
DDR3-1067
SMB
SMB
SMB
SMB
SMB
SMB
ICH10
ESI
DDR3-1067
SMI: Serial link between the processor and the SMBs
SMB: Scalable Memory Buffer Parallel/serial converter
SMB
SMB
SMB
SMB
SMB
SMB
SMB
SMB2x4 SMI
channels2x4 SMI
channels
ME
ME: Management Engine
Xeon 7500(Nehalem-EX)(Becton) 8C
Xeon 7-4800(Westmere-EX) 10C
Nehalem-EX 8CWestmere-EX
10C
Nehalem-EX 8CWestmere-EX
10C
Nehalem-EX 8CWestmere-EX
10C
/
Nehalem-EX aimed Boxboro-EX MP server platform (for up to 10 C)
Evolution to the Nehalem-EX aimed Boxboro-EX MP platform (that supports up to 10 cores) (In the basic system architecture we show the single IOH alternative)
3.4.2 Evolution of Intel’s multicore MP server platforms (4)
3.4.3 Evolution of AMD’s multicore MP server platforms (1)
3.4.3 Evolution of AMD’s multicore MP server platforms [47] (1)
Introduced in the single core K8-based Opteron DP/MP servers (AMD 24x/84x) (6/2003)Memory: 2 channels DDR-200/333 per processor, 4 DIMMs per channel.
Introduced in the 2x6 core K10-based Magny-Course (AMD 6100)(3/2010)Memory: 2x2 channels DDR3-1333 per processor, 3 DIMMs per channel.
3.4.3 Evolution of AMD’s multicore MP server platforms [47] (2)
3.4.3 Evolution of AMD’s multicore MP server platforms (2)
9. References
9. References (1)
[1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino
[2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226
[3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/
[4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29.
[5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm
[6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf
[7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004 , http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm
[8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2
[9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf
[10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html
9. References (2)
[11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf
[12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html
[13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf
[14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org
[15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf
[16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf
[17]: Detecting Memory Bandwidth Saturation in Threaded Applications, Intel, March 2 2010, http://software.intel.com/en-us/articles/detecting-memory-bandwidth-saturation-in- threaded-applications/
[18]: McCalpin J. D., STREAM Memory Bandwidth, July 21 2011, http://www.cs.virginia.edu/stream/by_date/Bandwidth.html
[19]: Rogers B., Krishna A., Bell G., Vu K., Jiang X., Solihin Y., Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling, ISCA 2009, Vol. 37, Issue 1, pp. 371-382
9. References (3)
[20]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture: Capacity, Performance, Reliability, and Longevity, Febr. 18 2004, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf
[21]: Wikipedia: Intel X58, 2011, http://en.wikipedia.org/wiki/Intel_X58
[22]: Sharma D. D., Intel 5520 Chipset: An I/O Hub Chipset for Server, Workstation, and High End Desktop, Hotchips 2009, http://www.hotchips.org/archives/hc21/2_mon/ HC21.24.200.I-O-Epub/HC21.24.230.DasSharma-Intel-5520-Chipset.pdf
[23]: DDR2 SDRAM FBDIMM, Micron Technology, 2005, http://download.micron.com/pdf/datasheets/modules/ddr2/HTF18C64_128_256x72F.pdf
[24]: Wikipedia: Fully Buffered DIMM, 2011, http://en.wikipedia.org/wiki/Fully_Buffered_DIMM
[25]: Intel E8500 Chipset eXternal Memory Bridge (XMB) Datasheet, March 2005, http://www.intel.com/content/dam/doc/datasheet/e8500-chipset-external-memory- bridge-datasheet.pdf
[26]: Intel 7500/7510/7512 Scalable Memory Buffer Datasheet, April 2011, http://www.intel.com/content/dam/doc/datasheet/7500-7510-7512-scalable-memory- buffer-datasheet.pdf
[27]: AMD Unveils Forward-Looking Technology Innovation To Extend Memory Footprint for Server Computing, July 25 2007, http://www.amd.com/us/press-releases/Pages/Press_Release_118446.aspx
[28]: Chiappetta M., More AMD G3MX Details Emerge, Aug. 22 2007, Hot Hardware, http://hothardware.com/News/More-AMD-G3MX-Details-Emerge/
9. References (4)
[29]: Goto S. H., The following server platforms AMD, May 20 2008, PC Watch, http://pc.watch.impress.co.jp/docs/2008/0520/kaigai440.htm
[30]: Wikipedia: Socket G3 Memory Extender, 2011, http://en.wikipedia.org/wiki/Socket_G3_Memory_Extender
[31]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc.
[32]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf
[33]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/documentation/white_papers/wp190.pdf
[34]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002, www.edn.com
[35]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008
[36]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online, 01/12/2007, http://eetimes.eu/showArticle.jhtml?articleID=196901366&queryText =calibrated
[37]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/documentation/white_papers/wp190.pdf
9. References (5)
[38]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1
[45]: Memory technology evolution: an overview of system memory technologies, Technology brief, 9th edition, HP, Dec. 2010, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00256987/c00256987.pdf
[46]: Kane L., Nguyen H., Take the Lead with Jasper Forest, the Future Intel Xeon Processor for Embedded and Storage, IDF 2009, July 27 2009, ftp://download.intel.com/embedded/processor/prez/SF09_EMBS001_100.pdf
[47]: The AMD Opteron™ 6000 Series Platform: More Cores, More Memory, Better Value, March 29 2010, http://www.slideshare.net/AMDUnprocessed/amd-opteron-6000-series -platform-press-presentation-final-3564470