Dezső Sima Fall 2008 (Ver. 1.0) Sima Dezső, 2008 DP/MP System Architectures.
Dezső Sima 2011 November (Ver. 1.4) Sima Dezső, 2011 Platforms I.
Transcript of Dezső Sima 2011 November (Ver. 1.4) Sima Dezső, 2011 Platforms I.
Dezső Sima
2011 November
(Ver. 1.4) Sima Dezső, 2011
Platforms I.
Contents
2. Main components of platforms•
1. Introduction to platforms•
5. References•
3. Platform architectures
4. Memory subsystem design considerations•
•
1. Introduction to platforms
1.1. The notion of platform•
1.2. Description of particular platforms
1.3. Representation forms of platforms
1.4. Compatibility of platform components
•
•
•
1.1. The notion of platform
The notion platform is widely used in different segments of the IT industry e.g. by IC manufacturers, system providers or even by software suppliers with different interpretations.Here we are focusing on the platform concept as used typically by system providers.
1.1 The notion of platform (1)
1.1 The notion of platform
System providers however, may use the notion platform either in a more general or a more specific sense.
Interpretation of the notion platform
Interpretation in a more general sense
Interpretation in amore specific sense
Unified system design A particular unified system architecture, developed for a given application area.
such as a DT or MP platform
Core 2 Duo Core 2 Extreme
(2C)
965 Series
MCH
ICH8
FSB
DMI C-link
Two memory channelsDDR2-800/666/533
Two DIMMs per channel
FSB: 1066/800/533 MT/s speed
ME
Intel’s Core 2 Duo (and Core 2 Extreme (the highest speed model) aimedDT platform (the Bridge Creek platform)
Unified system design means that the system architecture is partitioned to a small number of standard components, such as the processor, memory control hub (MCH), I/O control hub (ICH) that are interconnected by specified (standard) interconnections.
Thus the notion platform designates system architectures with unified design in the above sense.
1.1 The notion of platform (2)
Interpretation the notion platform in a more general sense
The need for a unified system design, called platform design, arose in the PC industry in the time when PCI-based system designs were substituted by port based system designs, about 1998-1999 .
Remark
1.1 The notion of platform (3)
Late PCI-based system architecture (~ 1998)
(used typically with Pentium II/III(built around Intel’s 440xx chipset)
Systemcontroller
PCI bus
Processor bus
Main Memory(EDO/SDRAM)
Peripheralcontroller
PCI device adapter
ISA deviceadapter
ISA bus
Pentium II/Pentium III
Pentium II/Pentium III
AGP
2xIDE/ATA33/66
2xUSB
(Legacy and/orslow devices)
Systemcontroller
PCI bus
Processor bus
Main Memory(SDRAM)
Peripheralcontroller
PCI device adapter
ISA deviceadapter
ISA bus
AGP
2xIDE/
2x/4x USB
Hub interfaceATA 33/66/100
PCI to ISA bridge
LPCSuper I/O (KBD, MS, etc.)AC'97
Legacy devices
Pentium III
Early port-based system architecture (~ 1999)(used first with Pentium III
(built around Intel’s 810 chipset)
1.1 The notion of platform (4)
In a more specific sense the notion platform refers to a particular unified system architecture, that is developed for a given application area, such as a DT, DP or MP platform.
• the processor or processors,• the chipset, • in some cases, such as in mobile or business oriented DT platforms also the networking component [7],• the buses interconnecting the above components of the platform as well as • the memory subsystem (MSS) that is attached by a specific memory interface..
In this sense the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of
Subsequently, we will focus on the interpretation of the notion platform in this latter sense.
Chipset Buses interconnecting the preceding
basic components
Processor or processors
The memory subsystem
Basic components of a platform
(LAN controller)
1.1 The notion of platform (6)
Interpretation the notion platform in a more specific sense
The primary goals of introducing unified system designs are
• to minimize design rework while moving from one processor generation to the next and • to stabilize interfaces for server and desktop designs [2]• to shorten the time to market.
1.1 The notion of platform (5)
Example 1: Intel’s Core 2 aimed home user DT platform (Bridge Creek) [3]
2 DIMMs/channel
2 DIMMs/channel
card
C-link
1066 MT/s
Display
Platform
1.1 The notion of platform (8)
Nehalem-EX 8CWestmere-EX
10C
7500 IOH
QPI
QPI
QPIQPI QPI QPI
QPIQPI
SMB
SMB
DDR3-1067
SMB
SMB
SMB
SMB
SMB
SMB
ICH10
ESI
DDR3-1067
SMI: Serial link between the processors and SMBsSMB: Scalable Memory Buffer Parallel/serial conversion
SMB
SMB
SMB
SMB
SMB
SMB
SMB
SMB2x4 SMI
channels2x4 SMI
channels
Example 2: Intel’s Nehalem-EX aimed Boxboro-EX MP server platform, assuming 1 IOH
ME
ME: Management Engine
Xeon 7500(Nehalem-EX)(Becton) 8C
Xeon 7-4800(Westmere-EX) 10C
Nehalem-EX 8CWestmere-EX
10C
Nehalem-EX 8CWestmere-EX
10C
Nehalem-EX 8CWestmere-EX
10C
/
Platform
Interfaces connecting platformcomponents
1.1 The notion of platform (9)
The structure of a platform is termed as its architecture (or topology).
It describes the basic components and their interconnections and will be discussed in Section 3.
1.1 The notion of platform (9)
Main goals of the system level design are
• to reduce the complexity of designing complex systems by partitioning it,• in this way to reduce the time-to-market of products,• to be able to enhance system components (such as processors) upward compatible as long as the same interfaces (e.g. an FSB with a given max. frequency) are used.
• The platform concept and platform based design will be considered as part of the system level design. • It became the topic of scientific research at the end of the 1990s, see e.g. [4].
Many facets of the platform concept
The platform concept as seen from the point of view of the system developers
Platform components are typically co-designed, announced and delivered as a set.
Co-design of platform components
1.1 The notion of platform (10)
• With the platform concept in mind manufacturers, like Intel or AMD will plan, design and market all key components of a platforms, such as the processor or the processors and the related chipset as an integrated entity [5].
• This is benefitial for the manufacturers since it motivates OEMs as system providers, to buy all key parts of a computer system from the same manufacturer.
The platform concept as seen from the point of view of the manufacturers
1.1 The notion of platform (11)
The platform concept as seen from the point of view of the customers
The platform concept is benefitial for the customers as well since an integrated “backbone” of a system architecture promises a more reliable and more cost effective system.
1.1 The notion of platform (12)
Historical remarks
System providers began using the notion “platform” about 2000, like
• Philips’ Nexperia digital video platform (1999), • Texas Intruments (TI) OMAP platform for SOCs (2002),• Intel’s first generation mobile oriented Centrino platform for laptops, designated as the Carmel platform (3/2003).
Intel contributed significantly for spreading the notion platform when based on the success of their Centrino platform they introduced this concept also for their desktops [5] and servers [6], [7] in 2004.
1.1 The notion of platform (13)
Intel’s early server and workstation roadmap from Aug. 2004 [6]
Note
a) This roadmap already makes use of the notion platform without revealing platform names.b) In 2004 Intel made a transition from 32 bit systems to 64 bit systems.
1.1 The notion of platform (14)
Intel’s multicore platform roadmap announced at the IDF Spring 2005 [8]
Note
This roadmap includes also the particular platform designations for desktops, UP servers etc.
1.1 The notion of platform (15)
1.2. Description of a particular platform
Description of a particular platform
Detailing the platform
architecture
Description of a particular platform
Example: The Tylersburg DT platform (2008)
1.2 Description of a particular platform (1)
Processor
MCH
ICH
Detailing the platform architecture includes the architecture of the processor-, the memory- and the I/O subsystems (to be discussed in Section 3).
1.2 Description of a particular platform (2)
Example: The Tylersburg DT platform (2008)
Processor
MCH
ICH
It is concerned with issues, such as whether the processors of an MP server are connected to the MCH via an FSB or otherwise, or whether the memory is attached to the system architecture through the MCH or through the processors etc.).
Identification of theplatform components
Description of a particular platform
Detailing the platform
architecture
Description of a particular platform
X58 IOH
ICH10
1. gen. Nehalem (4C)/
Westmere-EP (6C)
Example: The Tylersburg DT platform (2008)
Processor
MCH
ICH
1.2 Description of a particular platform (3)
Identification of theplatform components
Description of a particular platform
Specification of the interfaces
interconnecting the platform components
Detailing the platform
architecture
Description of a particular platform
X58 IOH
ICH10
1. gen. Nehalem (4C)/
Westmere-EP (6C)
X58 IOH
ICH10
QPI
DMI
1. gen. Nehalem (4C)/
Westmere-EP (6C)
Example: The Tylersburg DT platform (2008)
1.2 Description of a particular platform (4)
Processor
MCH
ICH
The specification of a platform will be completed by the datasheets of the related platform components.
Remark
1.2 Description of a particular platform (5)
Architecture ofDT platforms
Platform architecture
Architecture ofMP platforms
Architecture ofDP platforms
Architecture ofmobile platforms
In these slides platform architectures will be discussed in Section 3, nevertheless restricted only for DT, DP and MP platforms.
Dependence of the platform architecture on the platform category
Of course, beyond the above categories also further processor categories and related platforms exist, such as embedded processors and related platforms.
In conformity with different platform categories also different platform architectures arise, as indicated below.
Platforms may be classified according to the target area of application, such as
Desktop (DT) platforms
Platforms
Quad processor (MP) platforms
Dual processor (DP) platforms
Mobile platforms
1.2 Description of a particular platform (6)
1.3. Representation forms of platforms
1.3 Representation forms of platforms (1)
1.3 Representation forms of platforms
a) Thumbnail representationb) Roadmap like representation (an arbitrarily chosen representation form in these slides) c) Block diagram of a platform.
Core 2 Duo Core 2 Extreme
(2C)
965 Series
MCH
ICH8
FSB
DMI
DDR2-800/666/566
C-link
Two DDR2 channels
FSB: 1066/800/566 MT/s speed
ME Two DIMMs per channel
Example
In particular, the thumbnail representation• reveals the platform architecture,
• identifies the basic components of a platform, such as the processor or processors, the chipset, in some cases (e.g. in mobile platforms) also the Gigabit Ethernet controller,
• and specifies the interconnection links (buses) between the platform components.
Intel’s Core 2 Duo aimed home user oriented platform (The bridge Creek platform)
1.3 Representation forms of platforms (3)
a) Thumbnail representation
It is a concise representation of a particular platform.
Example for stating the compatibility range of a platform
The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform).
1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.
1.3 Representation forms of platforms (5)
Core 2 Duo Core 2 Extreme
(2C)
965 Series
MCH
ICH8
FSB
DMI
DDR2-800/666/566
C-link
Two DDR2 channels
FSB: 1066/800/566 MT/s speed
ME Two DIMMs per channel
• the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and• the subsequent Core 2 Quad lines of processors,
Beyond the target processor this platform may be used also with
as shown in the next slides.
Core 2-aimed (65 nm)
7/2006
6/2006
965 Series
6/2006
(Broadwater)FSB
1066/800/566 MT/s2 DDR2 channels
DDR2-800/666/5334 ranks/channel
8 GB max.
Core 2 Duo (2C)Core 2 Extr. (2C)
Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800
E6xxx/X68001: ConroeE4xxx)1: Allendale
65 nmConroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/sLGA775
ICH8
6/2006
Bridge Creek
DT core
MCH
ICH
DT platform
Example for stating the compatibility range of a platform
The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform).
1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.
1.3 Representation forms of platforms (5)
Core 2 Duo Core 2 Extreme
(2C)
965 Series
MCH
ICH8
FSB
DMI
DDR2-800/666/566
C-link
Two DDR2 channels
FSB: 1066/800/566 MT/s speed
ME Two DIMMs per channel
• the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and• the subsequent Core 2 Quad lines of processors,
Beyond the target processor this platform may be used also with
as shown in the next slides.
Core 2-aimed (65 nm)
7/2006
6/2006
965 Series
6/2006
(Broadwater)FSB
1066/800/566 MT/s2 DDR2 channels
DDR2-800/666/5334 ranks/channel
8 GB max.
Core 2 Duo (2C)Core 2 Extr. (2C)
Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800
E6xxx/X68001: ConroeE4xxx)1: Allendale
65 nmConroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/sLGA775
ICH8
6/2006
Bridge Creek
DT core
MCH
ICH
DT platform
DT cores
MCH
ICH
Pentium D/EE 8xx1
(Smithfield) 2x1C
90 nm2x115 mtrs2x103 mm2
2x1 MB L2800/533 MT/s
No multithreadingLGA775
5/2005
Pentium D/EE 9xx2,3
(Presler) 2x1C
65 nm2x188 mtrs2x81 mm2
2x2 MB L21066/800 MT/s
No multithreadingLGA775
1/2006
Pentium 4 6x0/6x1/EE
(Prescott-2M) 1C
90 nm169 mtrs135 mm2
2 MB L2800 MT/s
Two-way multithreadingLGA775
2/2005
1Pentium EE 840 supports only 800 MT/s2Pentium D 9xx support only 800 MT/s3Pentium EE 955/965 supports only 1066 MT/s
Supports alsoPentium D/EE processors/90/65 nm
Supports alsoPentium 4 6x0/6x1/EE processors/90nm
Support of Pentium 4/D/EE processors
1.3 Representation forms of platforms (6)
Core 2-aimed (65 nm)
7/2006
6/2006
965 Series
6/2006
(Broadwater)FSB
1066/800/566 MT/s2 DDR2 channels
DDR2-800/666/5334 ranks/channel
8 GB max.
Core 2 Duo (2C)Core 2 Extr. (2C)
Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800
E6xxx/X68001: ConroeE4xxx)1: Allendale
65 nmConroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/sLGA775
ICH8
6/2006
Bridge Creek
11/2006
Core 2 Quad (2x2C): Q6xxxQ6xxx: Kentsfield
65 nm2x291 mtrs/2x143 mm2
2*4 MB L21066 MT/s
LGA775
Core 2 Quad (2x2C)
Supports alsoCore 2 Quad processors/65 nm
Support of Core 2 Quad processors)
1.3 Representation forms of platforms (7)
Core 2-aimed (65 nm)
7/2006
6/2006
965 Series
6/2006
(Broadwater)FSB
1066/800/566 MT/s2 DDR2 channels
DDR2-800/666/5334 ranks/channel
8 GB max.
Core 2 Duo (2C)Core 2 Extr. (2C)
Core 2 Duo (2C):E6xxx/E4xxxCore 2 Extreme (2C): X6800
E6xxx/X68001: ConroeE4xxx)1: Allendale
65 nmConroe: 291 mtrs/143 mm2
Allendale: 167 mtrs/111 mm2
Conroe: 4 MB/Allendale 2 MB L2X6800/E6xxx: 1066 MT/s
E4xxx: 800MT/sLGA775
ICH8
6/2006
Bridge Creek
DT core
MCH
ICH
DT platform
c) Block diagram of a platform
Example: The Core 2 aimed home user DT platform (Bridge Creek) (without an integrated display controller) [3]
2 DIMMs/channel
2 DIMMs/channel
card
C-link
1066 MT/s
Display
1.3 Representation forms of platforms (8)
1.4. Compatibility of platform components
1.4 Compatibility of platform components
1.4 Compatibility of platform components (1)
One of the goals of platform based designs is to use stabilized interfaces (at least for a while) to minimize or eliminate design rework while moving from one processor generation to the next [2]. Consequently, assuming platform based designs, platform components, such as processors or chipsets of a given line are typically compatible with their previous or subsequent generations as long as the same interfaces are used and interface parameters (such FSB speed) or other implementation requirements (either from side of the components to be substituted or the substituting components) do not restrict this.
In the discussed DT platform the target processor is the Core 2, that is connected to the MCH by an FSB with 1066/800/533 MT/s.The target processor of the platform however, can be substituted
• either by processors of three previous generations or• processors of the subsequent generation (Core 2 Quad)
since all these processors have FSBs of 533/800/1066 MT/s, as shown before.
1.4 Compatibility of platform components (2)
Limits of compatibility
Nevertheless, The highest performance level Core 2 Quad, termed as the Core 2 Extreme Quad, provided already an increased FSB speed of 1333 MT/s and therefore was not more supported by the Core 2 aimed platform considered.
Core 2 Duo Core 2 Extreme
(2C)
965 Series
MCH
ICH8
FSB
DMI C-link
Two memory channelsDDR2-800/666/533
Two DIMMs per channel
FSB: 1066/800/533 MT/s
ME
2. Basic components of platforms
2.1. Processors•
2.2. Buses interconnecting platform components
2.3. The memory subsystem•
•
• the processor or processors,• the chipset, • in some cases, such as in mobile or business oriented DT platforms also the networking component [7],• the buses interconnecting the above components of the platform as well as • the memory subsystem (MSS) that is attached by a specific memory interface..
As already discussed in Section 1. the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of
Subsequently, we will discuss the following three basic components of platforms:
Chipset Buses interconnecting the preceding
basic components
Processor or processors
The memory subsystem
Basic components of a platform
(LAN controller)
1.1 The notion of platform (6)
Basic components of platforms - Overview
• Processors (Section 2.1)• Buses interconnecting platform components (excluding memory buses) (Section 2.2) and • The memory subsystem (Section 2.3).
2.1. Processors
2.1 Processors (1)
Figure 2.1: Overview of Intel’s Tick-Tock model (based on [17])
Adv. microarch., hyperthreading, 64-bit
New microarch., 4-wide core, 128-bit SIMD, no hyperthreading
11/2007
New microarch., hyperthreading,(inclusive) L3, integrated MC, QPI
01/2006
90nm
130nmTICKTOCK
180nm
2 Y
EA
RS
2 Y
EA
RS
2 Y
EA
RS
65nm
TICK Pentium 4 / Cedar Mill
TOCK Core 2 2 Y
EA
RS
New microarch.
Adv. microarch., hyperthreadingPentium 4 /Northwood
TICKTOCK
TICKTOCK Pentium 4 /Prescott
Pentium 4 /Willamette
07/2006
11/2008
New microarch. hyperthreading,256-bit AVX, integr. GPU, ring bus,
11/2000
01/2002
02/2004
Key microarchitectural featuresIntel’s Tick-Tock model
01/2011
01/2010
Basic architectures Basic architectures and their shrinks
Pentium 4(Prescott)
2005 90 nm Pentium 4
2006 65 nm Pentium 4
Core 22006 65 nm Core 2
2007 45 nm Penryn
Nehalem2008 45 nm Nehalem
2010 32 nm Westmere
Sandy Bridge2011 32 nm Sandy Bridge
2012 22 nm Ivy Bridge
Basic architectures and their related shrinks
Considered from the Pentium 4 Prescott (the third core of Pentium 4) on
2.1 Processors (2)
In 2003 Intel shifted the focus of their processor development from the performance goal to the aspect of performance per watt, as stated in a slide from 4/2006, see below.
Figure 2.3: Intel’s plan to develop their manufacturing technology and processor linesrevealed at a shareholder’s meeting back in 4/2006 [18]
2.1 Processors (4)
Basic Arch. Techn. Core/technology Cores Intro. Cache arch. Interf.
Core2 65 nm
X6800 ConroeE6xxx ConroeE4xxx AllendaleE6xxx AllendaleQX67xx Kentsfield Q6xxx Kentsfield
2C2C2C2C
2x2C2*2C
7/2006 7/20061/20077/2007
11/20061/2007
4 MB L2/2C2/4 MB L2/2C4 MB L2 /2C4 MB L2/2C4MB L2/2C4 MB l2/2C
FSB
Penryn 45 nm
E8xxx WolfdaleE7xxx Wolfdale-3MQX9xxx Yorkfield XEQ9xxx YorkfieldQ9xxx Yorkfield-6MQ8xxx Yorkfield-4M
2C2C
2x2C2*2C2*2C2x2C
1/20084/2008
11/20071/20081/20088/2008
6 MB L2/2C3 MB L2/2C6 MB L2/2C6 MB L2/2C3 MB L2/2C2 MB L2/2C
FSB
1. G. Nehalem-EP
2. G. Nehalem-EP45 nm
i7-920-965 Bloomfield
i7-8xxx/i5-7xx Lynnfield
4C
4C
11/2008
9/2009
¼ MB L2/C, 8 MB L3
¼ MB L2/C, 8 MB L3
QPI
DMI
Westmere-EP 32 nmi7-9xxX Gulftowni7-9xx Gulftowni5-6xx/i3-5xx Clarkdale
6C6C
2C+G
3/20107/20101/2010
¼ MB L2/C, 12 MB L3¼ MB L2/C, 12 MB L3
¼ MB L2/C, max. 4 MB L2
QPIQPIDMI
Sandy Bridge 32 nmi7-26/27/28/29xxi5-23/24/25xx Sandy Bridgei3-21/23xx
2/4C+G2/4C+G2C+G
1/20011/20111/2011
¼ MB L2/C, 4/8 MB L3¼ MB L2/C, 3/6 MB L3¼ MB L2/C, 3 MB L3
DMI2
Table 2.1: Intel’s Core 2 based and subsequent multicore DT processor lines
2.1 Processors (5)
Basic Arch. Core/technology DP server processors
Pentium 4 (Prescott)
Pentium 4 90 nm 10/2005 Paxville DP 2.8 2x1 C, 2 MB L2/C
Pentium 4 65 nm 5/2006 5000 (Dempsy) 2x1 C, 2 MB L2/C
Core 2
Core2 65 nm6/200611/206
5100 (Woodchrest)5300 (Clowertown)
1x2 C, 4 MB L2/C2x2 C, 4 MB L2/C
Penryn 45 nm 11/2007 5400 (Harpertown) 2x2 C, 6 MB L2/2C
Nehalem
Nehalem-EP 45 nm 3/2009 5500 (Gainstown) 1x4 C, ¼ MB L2/C 8 MB L3
Westmere-EP 32 nm 3/2010 56xx (Gulftown) 1x6 C, ¼ MB L2/C 12 MB L3
Nehalem-EX 45 nm 3/2010 6500 (Beckton) 1x8C, ¼ MB L2/C, 24 MB L3
Westmere-EX 32 nm
4/2011 E7-28xx (Westmere-EX) 1X10 C, ¼ MB L2/C 30 MB L3
Sandy Bridge
Sandy Bidge 32 nm 1/2011
Ivy Bridge 22 nm 11/2012?
Table 2.2: Overview of Intel’s multicore DP server processors
2.1 Processors (6)
Basic Arch.
Core/technology MP server processors
Pentium 4 (Prescott)
Pentium 4 90 nm 11/2005 Paxville MP 2x1 C, 2 MB L2/C
Pentium 4 65 nm 8/2006 7100 (Tulsa) 2x1 C, 1 MB L2/C 16 MB L3
Core 2
Core2 65 nm 9/20077200 (Tigerton DC)7300 (Tigerton QC)
1x2 C, 4 MB L2/C2x2 C, 4 MB L2/C
Penryn 45 nm 9/2008 7400 (Dunnington) 1x6 C, 3 MB L2/2C 16 MB L3
Nehalem
Nehalem-EP 45 nm
Westmere-EP 32 nm
Nehalem-EX 45 nm 3/2010 7500 (Beckton) 1x8 C, ¼ MB L2/C 24 MB L3
Westmere-EX 32nm 4/2011 E7-48xx (Westmere-EX) 1x10 C, ¼ MB L2/C 30 MB L3
Sandy Bridge
Sandy Bidge 32 nm /2011
Ivy Bridge 22 nm 11/2012
Table 2.3: Overview of Intel’s multicore MP server processors
2.1 Processors (7)
2.2. Buses interconnecting platform components
2.2 Buses interconnecting platform components (1)
Buses interconnectingprocessors
(In NUMA topologies)
Buses interconnecting processors to chipsets
Buses interconnectingMCHs to ICHs
(In 2-part chipsets)
Use of buses in Intel’s DT/DP and MP platforms
2.2 Buses interconnecting platform components
RemarkBuses connecting the memory subsystem with the main body of the platforms are memory specific interfaces and will be discussed in Section 4.
Nehalem-EX (8C) Westmere-EX
(10C)
QPI
QPI
DDR3-1067
SMB
SMB
SMB
SMB
ICH10
ESI
DDR3-1067
SMB
SMB
SMB
SMB
7500 IOH
QPI
Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores)
Nehalem-EX (8C) Westmere-EX
(10C)
or
Xeon 6500(Nehalem-EX)
(Becton)
Xeon E7-2800(Westmere-EX)
ME
SMI: Serial link between the processor and the SMB
SMB: Scalable Memory Buffer with Parallel/serial conversion
SMI links SMI links
Parallel/serial bus
Parallel bus
HI1.5
4-bit wide(4 PCIe lanes)
Serial bus(Point-to-point interconnection)
DMI (Direct Media Interface)
ESI (Enterprise System Interface)
DMI2(Direct Media Interface 2.G.)
FSB(Front Side Bus)
64-bit wide 8-bit wide
Used to interconnectprocessors to chipsetsin previous platforms
Used to interconnectMCHs to ICHs
in previous platforms
16-bit wide
QPI(Quick Path Interconnect)
QPI1.1(Quick Path Interconnect v.1.1)
Used to interconnectprocessors to processors
and processors to chipsets
Used to interconnectprocessors to chipsets
or MCHs to ICHs
Implementation of buses used in Intel’s DT/DP and MP platforms
2.2 Buses interconnecting platform components (2)
Buses used in Intel’s DT/DP/MP platforms
Buses interconnectingprocessors
(In NUMA topologies)
Buses interconnecting processors to chipsets
Buses interconnectingMCHs to ICHs
(In 2-parts chipsets)
Seri
al b
us
Para
llel/
seri
al b
us
Para
llel b
us
FSB (64-bit: 1993) HI 1.5 (1999)
DMI/ESI (20041)QPI (2008)
• 64-bit wide• ~150 lines• 3.2-12.8 GB/s total in both directions
• 8-bit wide• 16 lines• 266 MB/s total in both directions
• 4 PCIe lanes• 18 lines• 1 GB/s/direction
• 4 PCIe lanes• 18 lines• 2 GB/s/direction
DMI2 (2011)
• 20 lanes• 84 lines• 9.6/11.72/12.8 GB/s in each direction
DMI/ESI (2008)2
• 4 PCIe lanes• 18 lines• 1 GB/s/direction
• 4 PCIe lanes• 18 lines• 2 GB/s/direction
DMI2 (2011)
QPI (2008)
• 20 lanes• 84 lines• 9.6/11.72/12.8 GB/s in each direction
QPI1.1 (2012?)
Specification na.
Low-cost systems
High-performancesystems
2.2 Buses interconnecting platform components (3)
1 DMI: Introduced as an interface between the MCH and the ICH first along with the ICH6, supporting Pentium 4 Prescott processors, in 2004.
2 DMI: Introduced as an interface between the processors and the chipset first between Nehalem-EP and the 34xxPCH, in 2008, after the memory controllers were placed to the processor die.
Remarks
2.2 Buses interconnecting platform components (4)
Figure 2.4: Signal types used in MMs for control, address and data signals
Signals
Voltage referencedSingle ended Differential
LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL(D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage VREF: Reference Voltage
t t
VREF
LVTTL (3.3 V) FPM/EDO SDRAM HI1.5
TTL (5 V)
FPM/EDO
SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3)RSL (RDRAM)FSB
LVDS PCIe QPI, DMI, ESI FB-DIMMs
t
S+
S-VCM
Smaller voltage swings
Typ.voltageswings 600-800 mV
DRSL XDR (data)
200-300 mV3.3-5 V
Signalingsystem used
Signaling used in buses
2.2 Buses interconnecting platform components (5)
Main features of parallel buses used in Intel’s MC platforms
FSB HI 1.5
Typical useConnecting the processors
and the chipsetConnecting MCH and ICH
Introduced With the Pentium (1993) With the Pentium III (1999)
Width 64 bit 8 bit
Clock 100-400 MHz 66 MHz
DDR/QDR QDR since Pentium 4 (2000) QDR
Transfer rate 400-1600 MT/s 266 MT/s
Bandwidth3.2-12.8 GB/s
in both directions altogether266 MB/s
in both directions altogether
Signaling Voltage referenced data signals Single-ended data signals
No. of lines ~ 150 lines ~ 16 lines
FSB/HI 1.5: Bus type interconnects
2.2 Buses interconnecting platform components (6)
Main features of serial buses used in Intel’s MC platforms
DMI/ESI DMI2 QPI QPI 1.1
Typical useTo interconnect MCHs and ICHs
or processors to chipsets inNUMA platforms
To interconnect processors in NUMA topologies or processors to chipsets
IntroducedIn connection with 2. gen.
Nehalem in 2008
In connection with Sandy
Bridge in 2011
In connection with Nehalem-EP in 2008
In connection with Sandy Bridge in
2012 (?)
Width 4 PCI lanes 4 PCI2 lanes 20 lanesNo specification available yet
Clock 2.5 GHz 5 GHz 2.4/2.93/3.2 GHz
DDR – – DDR
Encoding 10bit/8bit 10bit/8bit no
Bandwidth/direction
1 GB/s 2 GB/s 9.6/11.72/12.8 GB/s
Signaling LVDS LVDS LVDS
No. of lines 18 lines 18 lines 84 lines
DMI/QPI: Point-to-point interconnection
2.2 Buses interconnecting platform components (7)
Comparing main features of Intel’s FSB and QPI [9]
2.2 Buses interconnecting platform components (8)
GTL+: A kind of voltage refenced signaling
Figure 2.5: LVDS Single Link Interface Circuit [10]
Principle of LVDS signal transmission used in serial buses
2.2 Buses interconnecting platform components (9)
PCI Express Data Frame [10]
PCIe package format (data frames)
The related fields are:
Field Interpretation
Frame 1-byte Start-of-Frame/End of Frame
Seq# 2-byte Sequence Number
Header 16- or 20-byte Header
Data 0-4096-byte Data field
CRC4 byte ECRC (End-to-End CRC) + 4-byte LCRC (Link CRC) (CRC: Cyclic Redundancy Check)
2.2 Buses interconnecting platform components (10)
16 data 2 protocol
2 CRC
TX Unidirectional link
RX Unidirectional link
Figure 2.6: Signals of the QuickPath Interconnect bus (QPI-bus) [11]
Principle of the QuickPath Interconnect bus (QPI bus)
2.2 Buses interconnecting platform components (11)
2.3. The memory subsystem
2.3.1. Key parameters of the memory subsystem•
2.3.2. Main attributes of the memory technology used•
2.3.2.1. Overview: Main attributes of the memory technology used
•
2.3.2.2. Memory type•
2.3.2.3. Speed grades•
2.3.2.4. DIMM density•
2.3.2.5. Use of ECC support•
2.3.2.6. Use of registering•
2.3.1 Key performance parameters of the memory subsystem (1)
2.3.1 Key performance parameters of the memory subsystem
This issue will be discussed in Section 4.
2.3.2 Main attributes of the memory technology used
Speed grade Use of registering
Memory type Use of ECC support
Main attributes of the memory technology used
2.3.2.1 Overview: Main attributes of the memory technology used
DIMM density
2.3.2.2Section 2.3.2.3 2.3.2.4 2.3.2.5 2.3.2.6
2.3.2 Main attributes of the memory technology used
2.3.2.2 Memory type (1)
a) Overview: Main DRAM types
1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
DRAM
(1970)
FB-DIMM
(2006)
DRDRAM
(1999)
DDR3
(2007)
DDR2
(2004)
DDR
(2000)
SDRAM
(1996)
FPM
(1983)
FP
(~1974)
XDR
(2006)1Year
of intro.
Asynchronous DRAMs Synchronous DRAMs
DRAMs with parallel bus connection
DRAMs with serial bus connection
DRAMs for general use
Main stream DRAM types Challenging DRAM types
EDO
(1995)
Commodity DRAMs
2.3.2.2 Memory type
b) Synchronous DRAMs (SDRAM, DDR, DDR2, DDR3)
2.3.2.2 Memory type (2)
SDRAM
DDR
DDR2
DDR3
168-pin
184-pin
240- pin
240-pin
All these DIMM modules are 8-byte wide
SDRAM to DDR3 DIMMs
2.3.2.2 Memory type (3)
DRAM device
DIMM
Memory CellArray
I/OBuffers
Memorycontroller
(MC)
DRAM device
Sources/sinks datato/from the I/O buffers
• at a rate of fCell
• at a width of FW
Receives/transmit datato/from the MC
fCell fCK
Data transmission
• at a rate of fCK (SDRAM) or• 2 x fclock (DDR to DDR3)
• on the rising edge of the strobe (CK) for SDRAMs or• on both edges of the strobe (DQS) for DDR/DDR2/DDR3.
Principle of operation of synchronous DRAMs (SDRAM to DDR3 memory chips)
2.3.2.2 Memory type (4)
The memory cell array sources/sinks data to/from the I/O buffers
• at a rate of fCell, where fCell is the clock frequency of the memory cell aray,
• at a data width of FW, where FW is the fetch width of the memory cell array.
• fCell is 100 to 200 MHz
• It stands in a given ratio with the clock frequency of the memory device (fCK) as follows:
• When a new memory technology (e.g. DDR2 or DDR3) appears fCore is initially 100 MHz, .this sets the initial speed grade of fCK accordingly (e.g. to 400 MT/s for DDR2 or to 800 MT/s for DDR3).
• As memory technology evolves fCore will be raised from 100 MHz to 133, 167 and to 200 MHz.
• Along with fCore fCK and the final speed grade will also be raised.
The core clock frequency of the memory cell array (fcell)
Sourcing/sinking data by the memory cell array
Raising fCell from 100 MHz to 200 MHz characterizes the evolution of each memory technology
fCK
SDRAM fcore
DDR fcore
DDR2 2 x fcore
DDR3 4 x fcore
2.3.2.2 Memory type (5)
It specifies how many times more bits the cell array fetches per column cycle then the data width of the device (xn).
E.g. a 4-bit wide DRAM device (x4 DRAM chip) with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array in every fCell cycle.
The fetch width (FW) of the memory cell array of synchronous DRAMs is as follows:
The fetch width (FW) of the memory cell array
DRAM type FW
SDRAM 1
DDR 2
DDR2 4
DDR3 8
2.3.2.2 Memory type (6)
Transferring data between the I/O Buffers and the Memory Controller
Data transmission between the I/O buffers and the Memory Controller is clocked by a frequency of fCK.
Data transmission occurs
• for SDRAMs at the rising edge of the strobe signal (CK)
• for DDR/DDR2/DDR3 at both edges of the strobe signal (DQS), designated as the Double Data Rate transfer)
The final transfer rate (speed grade) results in
• fCK for SDRAMs
• 2 x fCK for DDR/DDR2/DDR3
Accordingly, typical speed grade ranges cover
• 100 to 200 MT/s for SDRAM devices,• 200 to 400 MT/s for DDR devices,• 400 to 800 MT/s for DDR2 devices and• 800 to 1600 MT/s for DDR3 devices.
2.3.2.2 Memory type (7)
DRAM core clock100 MHz
Clock (CK/CK#)400 MHz
Memory CellArray
I/OBuffers
DDR3SDRAM DDR3-800
2 x fCK
fCell
n bits
8xn bits
Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)
800 MT/s
Data Strobe (DQS)400 MHzE.g.
DRAM core clock100 MHz
Clock (CK/CK#)200 MHz
Memory CellArray
I/OBuffers
DDR2SDRAM DDR2-400
2 x fCK
fCell
4xn bitsn bits
Data Strobe (DQS)200 MHz
Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)
400 MT/s
E.g.
Memory CellArray
I/OBuffers
DDRSDRAM DDR-200
fCKfCell
2xn bitsn bits
Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1)
200 MT/s
DRAM core clock100 MHz
Clock (CK/CK#)100 MHz
Data Strobe (DQS)100 MHzE.g.
DRAM core frequency100 MHz
Clock frequency (fCK)
100 MHz
Clock (CK)100 MHzE.g.
Memory CellArray
I/OBuffers
SDRAMSDRAM-100
fCKfCell
n bits n bits
Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1)
100 MT/s
shorter signal rise/fall times higher speed grades
but lower voltage budget higher requirements for signal integrity
Smallervoltageswings
Q = Cin x V = I x t tR ~ Cin x V/I
Q: Charge on the input capacitance of the line (Cin)Cin: Input capacitance of the line V: Voltage I: Current strength of the driver tR: Rise time
Relation between voltage swings and rise/fall times of signals
Voltage/Voltage swingMemory type
SDRAMDDRDDR2DDR3
3.3 V2.5 V1.8 V1.5 V
The main technique to increase memory speed
2.3.2.2 Memory type (9)
Figure 2.7: Signaling alternatives of buses used with memories
FPMEDO
SDRAM
DDRDDR2DDR3
RDRAM
FBDIMM
Sig
nalin
g o
f d
ata
lin
es
Volt
ag
e r
ef.
(RS
L,
SS
TL)
Diff
ere
nti
al
(DR
SL,
LV
DS
)S
ing
le e
nd
ed
(TTL,
LV
TTL)
XDRXDR2
Signaling of command, control and adress lines
Voltage ref.(RSL, SSTL)
Single ended(TTL, LVTTL)
Differential(DRSL, LVDS)
2.3.2.2 Memory type (10)
Table 2.4: Key features of synchronous DRAM devices
SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM
JEDEC standard JESD 21-C Release 4 JESD 79 JESD 79-2 JESD 79-3
Key featuresSynchronous, pipelined,
burst orientedDouble data rate
2n prefetch architectureDouble data rate
4n prefetch architectureDouble data rate
8n pref. architecture
StandardFirst/last release
JESD 21-CRelease 411/1993
JESD 796/2000
JESD 79E5/2005
JESD 79-29/2003
JESD 79-2C5/2006
JESD 79-36/2007
Device density 64 Mb 128 Mb - 1Gb 256 Mb - 4 Gb 256 Mb – 4 Gb 512 Mb – 8Gb
Organization x4/8/16 x4/8/16 x4/8/16 x4/8/16 x4/8/16
Device speed (MT/s) 66 100/133 200/266200/266/333/400
400/533/667/800
800/1066/1333/1600
Device density 4/16 Mb16-256 Mb
x8/1664-512 Mb
x8/16128-512 Mb
x8/16256 Mb – 1 Gb
x8/16256 Mb -1 Gb
x8/16512 Mb – 16 Gb
Typ. processorsPentium
(3V)Pentium III
P4 (Willamette)
P4 (Northwood)P4 (Prescott)
P4 (Prescott)P4 (Presler)Pentium DCore2 Duo
Core2 Duo toSandy Bridge
Voltage 3.3 V 2.5 V 1.8 V 1.5 V
No. of pins on the modul 168 184 240 240
Key features of synchronous DRAM devices (SDRAM to DDR3)
2.3.2.2 Memory type (11)
Approximate appearance dates and speed grades of DDR DRAMs as well as the bandwidth provided by a dual channel memory subsystem
Bandwidth1
1 Bandwidth of a dual channel memory subsystem [12]
2.3.2.2 Memory type (12)
Green and ultra-low power memories
Green memories: lower dissipation memories
Ultra-low-power DDR3 memories: Use of 1.35 V supply voltage instead of 1.50 V to reduce dissipation
They represents the latest achievements of the DRAM memory technology
2.3.2.2 Memory type (13)
Green and ultra-low power memories- Examples [13]
2.3.2.2 Memory type (14)
SDRAM
DDR
DDR2
DDR3
168-pin
184-pin
240- pin
240-pin
8-byte wide memory modules (DIMMs)
2.3.2.2 Memory type (15)
DRAM device
DIMM
Keying (notch)
Types of DIMMs
• 64 data bits or 64 data + 8 ECC bits wide memory block,• all devices of a rank will be activated by the same chip select (CS) signal.
DIMMs
Single-sided DIMMs Double-sided DIMMs
Rank (logical module)
DRAM devices are placedonly on one DIMM side
DRAM devices are placedon both DIMM sides
DIMM (physical module)
The physical carrier of ½, 1 or 2 ranks.
Single-sided/double-sided DIMMs
2.3.2.2 Memory type (16)
Examples for single-sided and double sided DIMMs with single or dual ranks [45]
9 x 8-bit DDR devices
2.3.2.2 Memory type (17)
Example: Traditional way of attaching DIMMs via a parallel channel to the MCH [45]
2.3.2.2 Memory type (18)
Example 2: Attaching DIMMs via 3 parallel memory channels to memory controllers implemented on the processor die
(This is actually Intel’s the Tylersburg DP platform, aimed at the Nehalem-EP processor, used for up to 6 cores) [46]
2.3.2.2 Memory type (18)
c) FB-DIMMs
1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
DRAM
(1970)
FB-DIMM
(2006)
DRDRAM
(1999)
DDR3
(2007)
DDR2
(2004)
DDR
(2000)
SDRAM
(1996)
FPM
(1983)
FP
(~1974)
XDR
(2006)1Year
of intro.
Asynchronous DRAMs Synchronous DRAMs
DRAMs with parallel bus connection
DRAMs with serial bus connection
DRAMs for general use
Main stream DRAM types Challenging DRAM types
EDO
(1995)
2.3.2.2 Memory type (19)
• Introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses)
• Introduce full buffering (registered DIMMs buffer only address and control signal)
• CRC error checking (cyclic redundancy check)
Principle of operation
2.3.2.2 Memory type (20)
The architecture of FB-DIMM memories [19]
2.3.2.2 Memory type (21)
Figure 2.8: Maximum supported FB-DIMM configuration [20](6 channels/8 DIMMs)
2.3.2.2 Memory type (22)
• Serial (differential) transmission between the North Bridge and the DIMMs (each bit needs a pair of wires)
• Read packets (frames, bursts): 12 x 14 = 168 bits
• 144 data bits
(equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles)• 24 CRC bits.
• Every 12 cycles (that is every two memory cycles) constitute a packet.
• Write packets (frames, bursts): 12 x 10 = 120 bits
• 98 payload bits
• 22 CRC bits.
• Clocked at 6 x data rate of the DDR2
e.g. for a DDR-667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz
• Number of seral links
• 14 read lanes (2 wires each)• 10 write lanes (2 wires each)
Implementation details (1)
2.3.2.2 Memory type (23)
98 payload bits.
• 2 frame type bits,
• 24 bits of command,
• 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands.
Commands
• all commands include a 3-bit FB-DIMM module address to select one of 8 modules.
Implementation details (2)
2.3.2.2 Memory type (24)
FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s
FB-DIMM-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s
FB-DIMM-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s
FB-DIMM data puffer
Figure 2.9: Different implementations of FB-DIMMs [48]
(Advanced Memory Buffer, AMB)
Manages the read/write operationsof the module
2.3.2.2 Memory type (25)
The notch (keying) differs from DDR2 DIMMs
(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)
Figure 2.10: Block diagram of the AMB [21]
2.3.2.2 Memory type (26)
S/PConverter
Necessary routing to connect the north bridge to the DIMM socket
a) In case of a DDR2 DIMM (240 pins)
b) In case of an FB-DIMM (69 pins)
A 3-layer PCB is needed A 2-layer PCB is needed(but a 3. layer is used for power lines)
Figure 2.11: PCB routing [19]
2.3.2.2 Memory type (27)
Assessing benefits and drawbacks of FB-DIMM memories (as compared to DDR2/3 memories)
Benefits of FB-DIMMs
higher memory size and bandwidth
• more DIMM modules (up to 8) per channel
higher memory size (6x8=48 DIMM size)
• more memory channels (up to 6)
Drawbacks of FB-DIMMs
• higher latency
(Typical dissipation figures: DDR2: about 5 W AMB: about 5 W FB-DIMM with DDR2: about 10 W)
• higher cost
• higher dissipation
asuming 8 GB/DIMM up to 512 GB
• same bandwidth figures as the parts based on (DDR2)
2.3.2.2 Memory type (28)
Latency [22]
• Due to their additional serialization tasks and daisy-chained nature FB-DIMMs have about 15 % higher overall average latency than DDR2 memories.
Production
The production of FB-DIMMs stopped with DDR2-800 modules, no DDR3 modules came to the market due to the drawbacks of the technology.
2.3.2.2 Memory type (29)
2.3.2.3 Speed grades (1)
Overview of the speed grades of DDR DRAMs
Bandwidth1
1 Bandwidth of a dual channel memory subsystem [12]
2.3.2.3 Speed grades
Then subsequent speed grades of FSBs and also those of the memories were chosen as subsequent integral multiples of 133 MHz, such as
266 = 2 x 133 400 ~= 3 x 133 533 ~= 4 x 133 667 ~= 5 x 133 800 ~= 6 x 1331067 ~= 7 x 1331333 ~= 8 x 1331600 ~= 9 x 133 etc.
Remark
Speed grades of FSBs and DRAMs were defined at the time when the base clock frequency of the FSBs was 133 MHz (around 2000).
2.3.2.3 Speed grades (2)
Figure 2.12: The evolution of peak transfer rates of parallel connected synchronous DRAMs as manifested in Intel’s chipsets
Transfer rate(MT/s)
50
100
500
Year03 0596 97 98 99 2000 01 02 04 06 07 08
*
**
*
*
*
*
*
20
*
1000
SDRAM66
5000
200
2000
10
~ 10*/10 years
DDR266
DDR2533
SDRAM100
DDR31333
DDR2667
DDR2800
DDR333
SDRAM133
*
DDR400
*
DDR31600
Rate of increasing the transfer rates in synchronous DRAMs
2.3.2.3 Speed grades (3)
Figure 2.13: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [23])
256M
64K
16M
1G
4M
256K
64M
1M
20151980 1985 1990 1995 2000 2005 2010
500
1000
1500
2000
16K
Units 106
Year
Density: ~4×/4Y
a) Device density
2.3.2.4. DIMM density
2.3.2.4 DIMM density (1)
b) DIMM (module) density
Based on
2.3.2.4 DIMM density (2)
• typical device densities of 1 to 8 Gb and with• typical widths of x4 to x16 (bits)
DDR2 or DDR3 modules provide typical densities of up to 8 or 16 GB.
Implemented as SEC-DED (Single Error Corretion Double Error Detection)
Single bit Error Correction
The minimum number of check-bits (P) for single bit error corection ?
2P ≥ the minimum number of states to be distinguished.
For D data bits P check-bits are added.
Figure: The code word
Requirement:
Data bits Check bits
2.3.2.5 Use of ECC support (1)
ECC basics (as used in DIMMs)
D P
2.3.2.5 Use of ECC support
• It is needed to specify the bit position of a possible single bit error in the code word consisting of both data and check bits This requires D + P states
• one additional state to specify the „no error” state.
2P ≥ D + P + 1
The minimum number of states to be distinguished:
the minimum number of states to be distinguished is: D + P + 1
to implement single bit error correction the minimum number of check bits (P) needs to satisfy the requirement:
Accordingly:
2.3.2.5 Use of ECC support (2)
Double bit error detection
an additional parity bit is needed to check for an additional error.
Then the minimum number of check-bits (CB) needed for SEC-DED is:
CB = P + 1
2CB-1 ≥ D + CB -1 + 1
Table 2.5: The number of check-bits (CB) needed for D data bits
since
Data bits (D) Check bits (CB)
1 2
3:2 3
7:4 4
15:8 5
31:16 6
63:32 7
127:64 8
255:128 9
511:256 10
2CB-1 ≥ D + CB
2P ≥ D + P + 1
P = CB - 1
2.3.2.5 Use of ECC support (3)
Support of ECC and registering in DT and DP/MP platforms
DT memories typically do not support ECC or registered (buffered) DIMMs,
Servers make typically use of registered DIMMs with ECC protection.
2.3.2.5 Use of ECC support (4)
Figure 2.14:Typical layout of a registered memory module with ECC [14]
• Two register chips, for buffering the address- and command lines• A PLL (Phase Locked Loop) unit for deskewing clock distribution.
Typical implementation of ECC protected registered DIMMs (used typically in servers)
ECC
RegisterRegister PLL
Main components
2.3.2.5 Use of ECC support (5)
2.3.2.6 Use of registering (1)
Higher memory capacities need more modules
Higher loading the lines
Signal integrity problems
Buffering address and command lines,Phase locked clocking of the modules
Problems arising while implementing higher memory capacities
2.3.2.6 Use of registering
Registering
Principle
• to reduce signal loading in a memory channel• in order to increase the number of supported DIMM slots (max. mem. capacity), needed first of all in servers,
Buffering address and control lines
2.3.2.6 Use of registering (2)
Implementation of registering
Figure 2.15: Registered signals in case of an SDRAM memory module [15]
REGISTER
REGE: Register enable signal
Note: Data (DQ) and data strobe (DQS) signals are not registered
as only address an control signals are common for all memory chips.
By means of a register chip that buffers address and control lines
2.3.2.6 Use of registering (3)
Number of register chips required
• Synchronous memory modules (SDRAM to DDR3 DIMMs) have about 20 – 30 address and control lines,
• Register chips buffer usually 14 lines,
Typically, two register chips are needed per memory module [16].
2.3.2.6 Use of registering (4)
Figure 2.17: Example. Block diagram of a registered DDR DIMM [16]
SDRAM
SDRAM
SDRAM
SDRAM
SDRAM
SDRAM
SDRAM
SDRAM
SDRAM
PI74SSTV16857 Register
PI74SSTV16857 Register
Address/Controlform
Motherboard
Address Controlfrom
Motherboard
PI6CV857PLL
Input Clockfor
Motherboard
Data From / To Motherboard
Example: Block diagram of a registered DDR DIMM
2.3.2.6 Use of registering (5)
Figure 2.16:Typical layout of a registered memory module with ECC [14]
• Two register chips, for buffering the address- and command lines• A PLL (Phase locked loop) unit for deskewing clock distribution.
Typical layout of registered DIMMs
ECC
RegisterRegister PLL
2.3.2.6 Use of registering (6)
Figure 2.18: Registered DIMM module with ECC [14]
Registered DIMM module with ECC
ECC
2.3.2.6 Use of registering (7)
in servers (Memory capacities: a few tens of GB to a few hundreds of GB)
Typical use of registered DIMM (RDIMM)
Typical use of unregistered DIMMs (UDIMMs)
in desktops/laptops (Memory capacities: up to a few GB)
2.3.2.6 Use of registering (9)
5. References
5. References (1)
[1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino
[2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226
[3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/
[4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29.
[5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm
[6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf
[7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004 , http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm
[8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2
[9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf
[10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html
5. References (2)
[11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf
[12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html
[13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf
[14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org
[15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf
[16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf
[17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf
[18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years, Xbit Laboratories, 04/28/2006, http://www.xbitlabs.com/news/cpu/display/20060428162855.html
[19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7
5. References (3)
[22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, 2007,
[20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1
[21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf
[23]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf