Presented by : Nasser Hadjloo .

46
EMERGING TRENDS OF INTEL MICROPROCESSORS Presented by : Nasser Hadjloo http://Hajloo.wordpress.com

Transcript of Presented by : Nasser Hadjloo .

Page 1: Presented by : Nasser Hadjloo .

EMERGING TRENDS OF INTEL

MICROPROCESSORS

Presented by :

Nasser Hadjloo

http://Hajloo.wordpress.com

Page 2: Presented by : Nasser Hadjloo .

Design Considerations Instruction-level parallelism. Use of Cache hierarchies and their management. Higher clock speeds The Front Side Bus (FSB). Multi-Threading. Power Consumption and heating issues. Etc …

Page 3: Presented by : Nasser Hadjloo .

Intel Architectures: Netburst

NetBurst Core NehalemSandy Bridge

Page 4: Presented by : Nasser Hadjloo .

NetBurst Architecture

Page 5: Presented by : Nasser Hadjloo .

Features of Netburst Architecture Hyperthreading

single processor appears to be two logical processor

Each logical processor has its own set of register, APIC( Advanced programmable interrupt controller)

Increases resource utilization and improve performance.

Page 6: Presented by : Nasser Hadjloo .

Rapid Execution Engine:Arithmetic Logic Units (ALUs) run at twice the

processor frequency.Basic integer operations executes in 1/2 processor

clock tick.Provides higher throughput and reduced latency of

execution.

Page 7: Presented by : Nasser Hadjloo .

Netburst Microarchitecture

Page 8: Presented by : Nasser Hadjloo .

Design Considerations Deeper pipeline(20 stage) with increased branch

mispredictions but greater clock speeds and performance.

Techniques to hide penalties such as parallel execution, buffering, and speculation.

Executes instructions dynamically and out-of order. Performance of a particular code sequence may vary

depending on the state the machine was in when that code sequence was entered.

Page 9: Presented by : Nasser Hadjloo .

Modifications in NetBurst Northwood design combined an increased cache size, a

smaller 130 nm fabrication process, and hyper-threading technology

Prescott, had a heavily improved branch predictor, the introduction of the SSE3 SIMD instructions , the implementation of Intel 64, Intel's branding for their compatible implementation of the x86-64 64-bit version of the x86 architecture

two Prescott cores in a single die, and later Presler, which consists of two Cedar Mill cores on two separate dies.

But this had problems……….

Page 10: Presented by : Nasser Hadjloo .

Heading to Core

NetBurst Core NehalemSandy Bridge

Page 11: Presented by : Nasser Hadjloo .

Core Microachitecture

Page 12: Presented by : Nasser Hadjloo .

Core Microarchitecture

Page 13: Presented by : Nasser Hadjloo .

Design Considerations of Core

L2 control unit (super-queue)= L2 controller (snoop requests)+ Bus control unit (data and I/O requests to and from the external bus).

Prefetching unit is extended to handle separately hardware prefetching by each core.

Shared L2 cache in the Core 2 Duo eliminates on-chip L2-level cache coherence and between L1s of two cores in Core 2 Duo.

Although, Core 2 Duo benefits from its on-chip access to the other L1 cache, its performance is limited.

Page 14: Presented by : Nasser Hadjloo .

Features of Core Architecture

Multiple cores and hardware virtualization. 14 stage pipeline (smaller than Netburst). Dual core design with linked L1 cache and shared L2

cache. Macrofusion - Two program instructions can be

executed as one micro-operation. Intel Intelligent Power Capability- manages run time

power consumption of the processors’ execution cores. Includes advanced power gating capacity- ultra fine-

grained control systems that turns on individual processor logic subsystems only if when they are needed.

Page 15: Presented by : Nasser Hadjloo .

Modifications in Core Allendale core, with 2 MB L2 cache, offers a

smaller die size and therefore greater yields. Merom, the first mobile version of the Core 2,

gives more emphasis on low power consumption to enhance notebook battery life.

Kentsfield released was the first Intel desktop quad core CPU. It comprises of two separate silicon dies (each equivalent to a single Core 2 duo) on one multi chip module

Penryn design are the addition of new instructions including SSE4.

Problem……..

Page 16: Presented by : Nasser Hadjloo .

Problem with quad core

Page 17: Presented by : Nasser Hadjloo .

Heading to Nehalem

NetBurst Core NehalemSandy Bridge

Page 18: Presented by : Nasser Hadjloo .

Introduction

Core i7 New Intel CPU brand name for the business and high-end consumer markets

Core i5 processors intended for the main-stream consumer market

Core i3 processors intended for the entry-level consumer market

Page 19: Presented by : Nasser Hadjloo .

Features of Nehalem Integrated Memory Controller Quick Path Interconnect Advanced Configuration and Power States Improvements to the pipeline (L2 Branch Predictor,

Renamed Returned Stack Buffer, L2 TLB, etc) HyperThreading SSE4.2 instructions Nehalem architecture has a three-level cache

Page 20: Presented by : Nasser Hadjloo .

Core i7 History

It was started by Bloomfield Architecture in 2008

In 2009 Lynnfield and Clarksfield models cames

Prior to 2010 all models were quad core In 2010 Arrandale (dual core) models

comes In 2010 Gulftown models (extreme)

comes which has six hyperthreaded cores

Page 21: Presented by : Nasser Hadjloo .

Bloomfield All models started by Core-i7 9xx with socket 1366 Includes single-processor servers sold as Xeon

35xx Replaced Yorkfield processors Use a different socket than other core-I cpus .

Even from all 45 nm cpus On-die memory controller (uncore clock) Use (only one) QPI instead of FSB Support for SSE4.2 & SSE4.1 instruction

sets

Page 22: Presented by : Nasser Hadjloo .

Bloomfield 32 KB L1 instruction and 32 KB L1 data cache

per core 256 KB L2 cache (combined instruction and

data) per core 8 MB L3 (combined instruction and data)

"inclusive", shared by all cores "Turbo Boost" technology allows all active cores

to intelligently clock themselves up in steps of 133 MHz over the design clock rate as long as the CPU's predetermined thermal and electrical requirements are still met

Page 23: Presented by : Nasser Hadjloo .

Lynnfield

Used on Core-i5 There is no QPI but directly connects to a

southbridge using a 2.5 GT/s Direct Media Interface and to other devices using PCI Express links in its Socket 1156

Core i7 processors based on Lynnfield have Hyper-Threading, which is disabled in Lynnfield-based Core i5 processors

Page 24: Presented by : Nasser Hadjloo .

Lynnfield

Core i5-7xx, Core i7-8xx or Xeon X34xx Replaced Penryn based Yorkfield processor 45 nm Socket 1156 opposed to the 1366 include Direct Media Interface and PCI

Express links (dedicated northbridge chip, called the memory controller hub or I/O hub)

Page 25: Presented by : Nasser Hadjloo .

Clarksfield

Is the mobile version of Lynnfield and available under the Core i7 Mobile brand

Quad core, 45 nm integrated PCI Express and DMI links Core i7 7xxQM (6MB), Core i7 8xxQM (8MB),

Core i7 9xxXM Extreme Edition (8MB) Replaced Penryn-QC

Page 26: Presented by : Nasser Hadjloo .

Arrandale Second Mobile cups which contains All Core i7

6xx [UE, LE, E] (4MB) Core i5 5xx [UM, M, E] (3MB), Core i5 4xxM

(3MB) Core i3 3xxM, Celeron U3xxx (unreleased), P4 xxx

(2MB) Integrated graphics processing unit but only two

processor cores 32 nm and Dual Core E series processors are embedded versions with

support for PCIe bifurcation and ECC memory

Page 27: Presented by : Nasser Hadjloo .

Clarkdale

Desktop version of Arrandale, 32 nm Only as Core i3 and Core i5 and Dual Core All support Intel's Hyper Threading (HT) Integrated Graphics as well as PCI-Express

and DMI links The Clarkdale processor package contains

two dies: the actual 32 nm processor with the I/O connections and the 45 nm graphics controller with the memory interface

Successor of Wolfdale (45nm)

Page 28: Presented by : Nasser Hadjloo .

Clarkdale

Used in Intel Core, Pentium and Celeron The Core i5 versions generally have all

features enabled Only the Core i5-661 model lacking Intel

VT-d and TXT like the Core i3, which also does not support Turbo Boost and the AES new instructions

Pentium and Celeron versions do not have SMT, only use a reduced amount of third-level cache

Page 29: Presented by : Nasser Hadjloo .

Gulftown or Westmere-EP

The Extreme Edition version of the Core i7 featuring 6 cores, 32nm process (core i9)

Gulftown is the first six-core dual-socket processor from Intel

Hyper-Threading (for a total of 12 logical threads), 12 MB of cache, Turbo Boost and Intel QuickPath connection bus

Uses Westmere micro architecture a 32 nm shrink version of Nehalem

Page 30: Presented by : Nasser Hadjloo .

Gulftown

50% higher performance than bloomfield core i7 975

Includes Core i7 9xx and Corei7 9xxx [12 MB], Xeon 36xx, Xeon 56xx

Socket 1366

Page 31: Presented by : Nasser Hadjloo .

Specification

Page 32: Presented by : Nasser Hadjloo .

Nehalem Architecture

Page 33: Presented by : Nasser Hadjloo .

Nehalem Architecture

Page 34: Presented by : Nasser Hadjloo .

Design Considerations Hypertreading is reintroduced to cater to

increasing number of thread based applications. Cores are placed on a single die to reduce

latencies. QuickPath Interconnect also supplements to

achieve this purpose. L1 and L2 for each core and large shared L3 cache

for improving performance.

Page 35: Presented by : Nasser Hadjloo .

Looking forward to Sandy Bridge

NetBurst Core NehalemSandy Bridge

Page 36: Presented by : Nasser Hadjloo .

What can we expect…… Sandy Bridge microchip will have an architecture

optimized for 32-nanometer transistors The Sandy Bridge microarchitecture is also said to

focus on the connections of the processor core like vertical interconnects and multilevel dies

Increase in FLOPs by using AVX (Advanced Vector Extensions)

Haswell will be the successor to Sandy Bridge will be in 22nm.

The tick tock model works just fine…!!!

Page 37: Presented by : Nasser Hadjloo .

Trends and Performance Comparison

Page 38: Presented by : Nasser Hadjloo .

Intel Processor Trends

Page 39: Presented by : Nasser Hadjloo .

Intel Processor TrendsNetBurst Core Nehalem

Cache Hierarchy

Two level hierarchy

Two level hierarchy

Three level hierarchy

Second level cache size

256KB–2MB 1MB–12MB >1MB

Third level cache size

- - 8MB

Front side bus(in MHz)

400, 533, 800, 1066

533, 667,800, 1066,1333,1600

(QPI=6.4GT/s)

Page 40: Presented by : Nasser Hadjloo .

Intel Processor Trends

Page 41: Presented by : Nasser Hadjloo .

SPEC 2000benchmark2003- (3.0 GHz, Pentium 4 processor with Hyper-Threading Technology)Primary Cache: 12k micro-ops I + 8KBD on chip Secondary Cache: 512KB(I+D) on chipMemory: 512 MB

2004- (3.80 GHz, Intel Pentium 4 processor 570J) Primary Cache: 12k micro-ops I + 16KBD on chip Secondary Cache: 1MB(I+D) on chip Memory: 1 GB

2005- 3.73 GHz, Intel(R) Pentium(R) 4 processorPrimary Cache: 12k micro-ops I + 16KBD on chip Secondary Cache: 2MB(I+D) on chipMemory: 1 GB

2006- Intel(R) Core(TM) 2 Extreme processor X6800( 2.93 GHz, 1066 MHz bus Primary Cache: 32KBI + 32KBD per core, on chip Secondary Cache: 4 MB(I+D) per chip, on chip (shared) Memory: 2 GB

Page 42: Presented by : Nasser Hadjloo .

SPEC 2006 benchmark

2006:Intel Core 2 Duo E6700 2.67 GHz, 1066 MHz bus

Primary Cache: 32 KB I + 32 KB D on chip per core Secondary Cache: 4 MB I+D on chip per chip

Memory: 2 GB

2007:Intel Core 2 Extreme QX9650 3.00 GHz 1333 MHz FSB

Primary Cache:32 KB I + 32 KB D on chip per coreSecondary Cache:12 MB I+D on chip per chip, 6 MB shared / 2 cores

Memory: 4 GB

2008:Intel Xeon X5270 3.5GHz

Primary Cache: 32 KB I + 32 KB D on chip per coreSecondary Cache: 6 MB I+D on chip per chip

Memory: 16 GB

2009:Intel Core i7-965 Extreme EditionIntel Turbo Boost Technology up to 3.46 GHz

Primary Cache: 32 KB I + 32 KB D on chip per coreSecondary Cache: 256 KB I+D on chip per coreL3 Cache: 8 MB I+D on chip per chip

Memory: 12 GB

Page 43: Presented by : Nasser Hadjloo .

Concluding Remarks

Page 44: Presented by : Nasser Hadjloo .

Focus needs to be on more scalable and robust architecture.

Implementing 3-D integration. How about a 128 bit processor? The speed of light problem. The end of Moore’s Law?

Our Views

Page 45: Presented by : Nasser Hadjloo .

REFERENCES:

Journals: Koufaty, D. Marr, D.T, “Hyperthreading technology In the netburst Microarchitecture”,

Volume: 23 , Issue: 2, page(s): 56 – 65. Lu Peng, Jih-Kwon Peir, Prakash, T.K., Yen-Kuang Chen, Koppelman, D, “Memory

Performance and Scalability of Intel's and AMD's Dual-Core Processors: A Case Study”, Performance, Computing, and Communications Conference, 2007. IPCCC 2007. IEEE International 11-13 April 2007 Page(s):55 – 64. 

Kurd, N., Douglas, J., Mosalikanti, P., Kumar, R., “Next generation Intel® micro-architecture (Nehalem) clocking architecture”, VLSI Circuits, 2008 IEEE Symposium on 18-20 June 2008 Page(s):62 – 63.

Varghese George, Sanjeev Jahagirdar, Chao Tong, Smits, Ken, Satish Damaraju, Siers, Scott, Ves Naydenov, Tanveer Khondker, Sanjib Sarkar, Puneet Singh, “Penryn: 45-nm next generation Intel® core™ 2 processor”, Solid-State Circuits Conference, 2007. ASSCC '07. IEEE Asian 12-14 Nov. 2007 Page(s):14 – 17.

Chang, J., Ming Huang, Shoemaker, J., Benoit, J., Szu-Liang Chen, Wei Chen, Siufu Chiu, Ganesan, R.; Leong, G., Lukka, V., Rusu, S., Srivastava, D., “The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series”, Solid-State Circuits, IEEE Journal of Volume 42,  Issue 4,  April 2007 Page(s):846 – 852.

Bin-feng Qian, Li-min Yan, “The research of the inclusive cache used in multi-core processor”, Electronic Packaging Technology & High Density Packaging, 2008. ICEPT-HDP 2008. International Conference on 28-31 July 2008 Page(s):1 – 4.

Online References:  www.wikipedia.org www.intel.com http://www.hexus.net/content/item.php?item=3824

Page 46: Presented by : Nasser Hadjloo .

Question