EE 660: Computer Architecture Lecture 1: Introduction and ...
Transcript of EE 660: Computer Architecture Lecture 1: Introduction and ...
![Page 1: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/1.jpg)
EE 660: Computer Architecture
Lecture 1: Introduction and Instruction Set Architectures
Yao Zheng Department of Electrical Engineering
University of Hawaiʻi at Mānoa
1 Based on the slides of Prof. David Wentzlaff
![Page 2: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/2.jpg)
What is Computer Architecture? Application
2
![Page 3: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/3.jpg)
What is Computer Architecture?
Physics
Application
3
![Page 4: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/4.jpg)
What is Computer Architecture?
Physics
Application
Gap too large to bridge in one step
4
![Page 5: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/5.jpg)
What is Computer Architecture?
In its broadest definition, computer architecture is the design of the abstraction/implementation layers that allow us to execute information processing applications efficiently using manufacturing technologies
Physics
Application
Gap too large to bridge in one step
5
![Page 6: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/6.jpg)
What is Computer Architecture?
In its broadest definition, computer architecture is the design of the abstraction/implementation layers that allow us to execute information processing applications efficiently using manufacturing technologies
Physics
Application
Gap too large to bridge in one step
6
![Page 7: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/7.jpg)
Abstractions in Modern Computing Systems
Physics
Devices
Circuits Gates
Register-Transfer Level Microarchitecture
Instruction Set Architecture
Operating System/Virtual Machines Programming Language
Algorithm Application
7
![Page 8: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/8.jpg)
Abstractions in Modern Computing Systems
Physics
Devices
Circuits Gates
Register-Transfer Level Microarchitecture
Instruction Set Architecture
Operating System/Virtual Machines Programming Language
Algorithm Application
Computer Architecture(EE 660)
8
![Page 9: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/9.jpg)
Computer Architecture is Constantly Changing
Physics
Devices
Circuits Gates
Register-Transfer Level Microarchitecture
Instruction Set Architecture
Operating System/Virtual Machines Programming Language
Algorithm Application Application Requirements:
• Suggest how to improve architecture• Provide revenue to fund development
Technology Constraints: • Restrict what can be done efficiently• New technologies make new arch
possible
9
![Page 10: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/10.jpg)
Computer Architecture is Constantly Changing
Physics
Devices
Circuits Gates
Register-Transfer Level Microarchitecture
Instruction Set Architecture
Operating System/Virtual Machines Programming Language
Algorithm Application Application Requirements:
• Suggest how to improve architecture• Provide revenue to fund development
Technology Constraints: • Restrict what can be done efficiently• New technologies make new arch
possible
Architecture provides feedback to guide application and technology research directions
10
![Page 11: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/11.jpg)
Computers Then…
IAS Machine. Design directed by John von Neumann. First booted in Princeton NJ in 1952 Smithsonian Institution Archives (Smithsonian Image 95-06151) 11
![Page 12: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/12.jpg)
Computers Now
12
Robots
Supercomputers Automobiles
Laptops
Set-top boxes
Smart phones
Servers Media Players
Sensor Nets
Routers
Cameras Games
![Page 13: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/13.jpg)
[from Kurzweil]
Major Technology Generations Bipolar
nMOS
CMOS
pMOS
Relays
Vacuum Tubes
Electromechanical
13
![Page 14: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/14.jpg)
Sequential Processor Performance
14 From Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
![Page 15: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/15.jpg)
Sequential Processor Performance
RISC
15 From Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
![Page 16: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/16.jpg)
Sequential Processor Performance
RISC
Move to multi-processor
16 From Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
![Page 17: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/17.jpg)
Course Structure
• Recommended Readings• In-Lecture Questions• Problem Sets
– Very useful for exam preparation
– Peer Evaluation
• Midterm• Final Project
17
![Page 18: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/18.jpg)
Course Content Digital Sys. & Computer Design (EE 361)
Computer Organization • Basic Pipelined
Processor
~50,000 Transistors
Photo of Berkeley RISC I, © University of California (Berkeley) 18
![Page 19: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/19.jpg)
Course Content Computer Architecture (EE 660)
Intel Nehalem Processor, Original Core i7, Image Credit Intel: http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
19
![Page 20: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/20.jpg)
Intel Nehalem Processor, Original Core i7, Image Credit Intel: http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
~700,000,000 Transistors 20
Course Content Computer Architecture (EE 660)
![Page 21: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/21.jpg)
Intel Nehalem Processor, Original Core i7, Image Credit Intel: http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
~700,000,000 Transistors
EE 361 Processor
21
Course Content Computer Architecture (EE 660)
![Page 22: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/22.jpg)
Intel Nehalem Processor, Original Core i7, Image Credit Intel: http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
~700,000,000 Transistors
• Instruction Level Parallelism– Superscalar– Very Long Instruction Word (VLIW)
• Long Pipelines (PipelineParallelism)
• Advanced Memory and Caches• Data Level Parallelism
– Vector– GPU
• Thread Level Parallelism– Multithreading– Multiprocessor– Multicore– Manycore
22
EE 361 Processor
Course Content Computer Architecture (EE 660)
![Page 23: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/23.jpg)
Architecture vs. Microarchitecture
“Architecture”/Instruction Set Architecture:• Programmer visible state (Memory & Register)• Operations (Instructions and how they work)• Execution Semantics (interrupts)• Input/Output• Data Types/SizesMicroarchitecture/Organization:• Tradeoffs on how to implement ISA for some metric
(Speed, Energy, Cost)• Examples: Pipeline depth, number of pipelines, cache
size, silicon area, peak power, execution ordering, buswidths, ALU widths
23
![Page 24: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/24.jpg)
Software Developments
24
up to 1955 Libraries of numerical routines - Floating point operations- Transcendental functions- Matrix manipulation, equation solvers, . . .
1955-60 High level Languages - Fortran 1956 Operating Systems -
- Assemblers, Loaders, Linkers, Compilers - Accounting programs to keep track of
usage and charges
![Page 25: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/25.jpg)
Software Developments
25
up to 1955 Libraries of numerical routines - Floating point operations- Transcendental functions- Matrix manipulation, equation solvers, . . .
1955-60 High level Languages - Fortran 1956 Operating Systems -
- Assemblers, Loaders, Linkers, Compilers - Accounting programs to keep track of
usage and charges
Machines required experienced operators
• Most users could not be expected to understand these programs, much less write them
• Machines had to be sold with a lot of resident software
![Page 26: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/26.jpg)
Compatibility Problem at IBM
26
By early 1960’s, IBM had 4 incompatible lines of computers!
701 7094650 7074702 70801401 7010
Each system had its own • Instruction set• I/O system and Secondary Storage:
magnetic tapes, drums and disks • assemblers, compilers, libraries,...• market niche business, scientific, real time, ...
![Page 27: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/27.jpg)
Compatibility Problem at IBM
27
By early 1960’s, IBM had 4 incompatible lines of computers!
701 7094650 7074702 70801401 7010
Each system had its own • Instruction set• I/O system and Secondary Storage:
magnetic tapes, drums and disks • assemblers, compilers, libraries,...• market niche business, scientific, real time, ...
![Page 28: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/28.jpg)
Compatibility Problem at IBM
28
By early 1960’s, IBM had 4 incompatible lines of computers!
701 7094650 7074702 70801401 7010
Each system had its own • Instruction set• I/O system and Secondary Storage:
magnetic tapes, drums and disks • assemblers, compilers, libraries,...• market niche business, scientific, real time, ...
IBM 360
![Page 29: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/29.jpg)
IBM 360 : Design Premises Amdahl, Blaauw and Brooks, 1964
29
• The design must lend itself to growth and successormachines
• General method for connecting I/O devices• Total performance - answers per month rather than bits per
microsecond programming aids• Machine must be capable of supervising itself without
manual intervention• Built-in hardware fault checking and locating aids to reduce
down time• Simple to assemble systems with redundant I/O devices,
memories etc. for fault tolerance• Some problems required floating-point larger than 36 bits
![Page 30: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/30.jpg)
30
IBM 360: A General-Purpose Register (GPR) Machine
• Processor State– 16 General-Purpose 32-bit Registers
• may be used as index and base register• Register 0 has some special properties
– 4 Floating Point 64-bit Registers– A Program Status Word (PSW)
• PC, Condition codes, Control flags
• A 32-bit machine with 24-bit addresses– But no instruction contains a 24-bit address!
• Data Formats– 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words
![Page 31: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/31.jpg)
31
IBM 360: A General-Purpose Register (GPR) Machine
• Processor State– 16 General-Purpose 32-bit Registers
• may be used as index and base register• Register 0 has some special properties
– 4 Floating Point 64-bit Registers– A Program Status Word (PSW)
• PC, Condition codes, Control flags
• A 32-bit machine with 24-bit addresses– But no instruction contains a 24-bit address!
• Data Formats– 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words
The IBM 360 is why bytes are 8-bits long today!
![Page 32: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/32.jpg)
32
IBM 360: Initial Implementations Model 30 . . . Model 70
Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bitCircuit Delay 30 nsec/level 5 nsec/levelLocal Store Main Store Transistor RegistersControl Store Read only 1sec Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!
![Page 33: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/33.jpg)
33
IBM 360: Initial Implementations Model 30 . . . Model 70
Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bitCircuit Delay 30 nsec/level 5 nsec/levelLocal Store Main Store Transistor RegistersControl Store Read only 1sec Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!
With minor modifications it still survives today!
![Page 34: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/34.jpg)
IBM 360: 47 years later… The zSeries z11 Microprocessor
• 5.2 GHz in IBM 45nm PD-SOI CMOS technology• 1.4 billion transistors in 512 mm2
• 64-bit virtual addressing– original S/360 was 24-bit, and S/370 was 31-bit extension
• Quad-core design• Three-issue out-of-order superscalar pipeline• Out-of-order memory accesses• Redundant datapaths
– every instruction performed in two parallel datapaths andresults compared
• 64KB L1 I-cache, 128KB L1 D-cache on-chip• 1.5MB private L2 unified cache per core, on-chip• On-Chip 24MB eDRAM L3 cache• Scales to 96-core multiprocessor with 768MB of
shared L4 eDRAM
[ IBM, Kevin Shum, HotChips, 2010] Image Credit: IBM
Courtesy of International Business Machines Corporation, © International
Business Machines Corporation. 34
![Page 35: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/35.jpg)
Same Architecture Different Microarchitecture
AMD Phenom X4 • X86 Instruction Set• Quad Core• 125W• Decode 3 Instructions/Cycle/Core• 64KB L1 I Cache, 64KB L1 D Cache• 512KB L2 Cache• Out-of-order• 2.6GHz
Intel Atom • X86 Instruction Set• Single Core• 2W• Decode 2 Instructions/Cycle/Core• 32KB L1 I Cache, 24KB L1 D Cache• 512KB L2 Cache• In-order• 1.6GHz
Image Credit: Intel
Image Credit: AMD 35
![Page 36: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/36.jpg)
Different Architecture Different Microarchitecture
AMD Phenom X4 • X86 Instruction Set• Quad Core• 125W• Decode 3 Instructions/Cycle/Core• 64KB L1 I Cache, 64KB L1 D Cache• 512KB L2 Cache• Out-of-order• 2.6GHz
IBM POWER7 • Power Instruction Set• Eight Core• 200W• Decode 6 Instructions/Cycle/Core• 32KB L1 I Cache, 32KB L1 D Cache• 256KB L2 Cache• Out-of-order• 4.25GHz
Image Credit: IBM Image Credit: AMD
Courtesy of International Business Machines Corporation, © International Business Machines Corporation.
36
![Page 37: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/37.jpg)
Where Do Operands Come from And Where Do Results Go?
37
![Page 38: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/38.jpg)
Where Do Operands Come from And Where Do Results Go?
38
ALU
![Page 39: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/39.jpg)
Where Do Operands Come from And Where Do Results Go?
39
ALU
…
Mem
ory
![Page 40: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/40.jpg)
Proc
esso
r
…
Where Do Operands Come from And Where Do Results Go?
40
ALU
…
Mem
ory
![Page 41: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/41.jpg)
Where Do Operands Come from And Where Do Results Go?
41
![Page 42: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/42.jpg)
Where Do Operands Come from And Where Do Results Go?
42
… TOS
ALU
Proc
esso
r
…
Mem
ory
Stack
![Page 43: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/43.jpg)
Where Do Operands Come from And Where Do Results Go?
43
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
Stack Accumulator
![Page 44: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/44.jpg)
Where Do Operands Come from And Where Do Results Go?
44
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
… Stack Accumulator
Register- Memory
![Page 45: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/45.jpg)
Where Do Operands Come from And Where Do Results Go?
45
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
… Stack Accumulator
Register- Memory
Register- Register
ALU
Proc
esso
r
…
Mem
ory
…
![Page 46: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/46.jpg)
Where Do Operands Come from And Where Do Results Go?
46
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
…
Mem
ory
… Stack Accumulator
Register- Memory
Register- Register
0 1 2 or 3 Number Explicitly Named Operands:
ALU
Proc
esso
r
…
Mem
ory
…
2 or 3
![Page 47: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/47.jpg)
Stack-Based Instruction Set Architecture (ISA)
• Burrough’s B5000 (1960)• Burrough’s B6700• HP 3000• ICL 2900• Symbolics 3600Modern• Inmos Transputer• Forth machines• Java Virtual Machine• Intel x87 Floating Point Unit
47
… TOS
ALU
Proc
esso
r
…
Mem
ory
![Page 48: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/48.jpg)
Evaluation of Expressions
48
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
![Page 49: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/49.jpg)
Evaluation of Expressions
49
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
![Page 50: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/50.jpg)
Evaluation of Expressions
50
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack
![Page 51: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/51.jpg)
Evaluation of Expressions
51
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack push a
a
![Page 52: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/52.jpg)
Evaluation of Expressions
52
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
![Page 53: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/53.jpg)
b
Evaluation of Expressions
53
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
push b
![Page 54: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/54.jpg)
b
Evaluation of Expressions
54
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
![Page 55: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/55.jpg)
c b
Evaluation of Expressions
55
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
push c
![Page 56: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/56.jpg)
c b
Evaluation of Expressions
56
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
![Page 57: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/57.jpg)
c b
Evaluation of Expressions
57
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
multiply
* b * c
![Page 58: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/58.jpg)
c b
Evaluation of Expressions
58
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
b * c
![Page 59: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/59.jpg)
Evaluation of Expressions
59
a
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
add Evaluation Stack
b * c
![Page 60: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/60.jpg)
Evaluation of Expressions
60
a
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
add
+
Evaluation Stack
b * c
![Page 61: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/61.jpg)
Evaluation of Expressions
61
a
(a + b * c) / (a + d * c - e) /
+
* +a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
add
+
Evaluation Stack
b * c a + b * c
![Page 62: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/62.jpg)
Hardware organization of the stack
• Stack is part of the processor state stack must be bounded and small
number of Registers,not the size of main memory
• Conceptually stack is unboundeda part of the stack is included in the
processor state; the rest is kept in themain memory
62
![Page 63: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/63.jpg)
Stack Operations and Implicit Memory References
• Suppose the top 2 elements of the stack are keptin registers and the rest is kept in the memory.
Each push operation 1 memory reference pop operation 1 memory reference
No Good! • Better performance by keeping the top N
elements in registers, and memory references aremade only when register stack overflows orunderflows.
Issue - when to Load/Unload registers ?
63
![Page 64: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/64.jpg)
Stack Operations and Implicit Memory References
• Suppose the top 2 elements of the stack are keptin registers and the rest is kept in the memory.
Each push operation 1 memory reference pop operation 1 memory reference
No Good! • Better performance by keeping the top N
elements in registers, and memory references aremade only when register stack overflows orunderflows.
Issue - when to Load/Unload registers ?
64
![Page 65: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/65.jpg)
Stack Operations and Implicit Memory References
• Suppose the top 2 elements of the stack are keptin registers and the rest is kept in the memory.
Each push operation 1 memory reference pop operation 1 memory reference
No Good! • Better performance by keeping the top N
elements in registers, and memory references aremade only when register stack overflows orunderflows.
Issue - when to Load/Unload registers ?
65
![Page 66: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/66.jpg)
Stack Size and Memory References
66
program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0
a b c * + a d c * + e - /
![Page 67: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/67.jpg)
Stack Size and Memory References
67
program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0
a b c * + a d c * + e - /
4 stores, 4 fetches (implicit)
![Page 68: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/68.jpg)
Stack Size and Expression Evaluation
68
program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0
a b c * + a d c * + e - /
![Page 69: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/69.jpg)
Stack Size and Expression Evaluation
69
program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0
a b c * + a d c * + e - /
a and c are “loaded” twice
not the best use of registers!
![Page 70: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/70.jpg)
Machine Model Summary
70
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
… M
emor
y
ALU
Proc
esso
r
…
Mem
ory
… Stack Accumulator
Register- Memory
Register- Register
ALU
Proc
esso
r
…
Mem
ory
…
![Page 71: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/71.jpg)
Machine Model Summary
71
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
… M
emor
y
ALU
Proc
esso
r
…
Mem
ory
… Stack Accumulator
Register- Memory
Register- Register
ALU
Proc
esso
r
…
Mem
ory
…
C = A + B
![Page 72: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/72.jpg)
Machine Model Summary
72
… TOS
ALU
Proc
esso
r
…
Mem
ory
ALU
Proc
esso
r
… M
emor
y
ALU
Proc
esso
r
…
Mem
ory
… Stack Accumulator
Register- Memory
Register- Register
ALU
Proc
esso
r
…
Mem
ory
…
C = A + B
Push A Push B Add Pop C
Load A Add B Store C
Load R1, A Add R3, R1, B Store R3, C
Load R1, A Load R2, B Add R3, R1, R2 Store R3, C
![Page 73: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/73.jpg)
Classes of Instructions • Data Transfer
– LD, ST, MFC1, MTC1, MFC0, MTC0• ALU
– ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI• Control Flow
– BEQZ, JR, JAL, TRAP, ERET• Floating Point
– ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W,• Multimedia (SIMD)
– ADD.PS, SUB.PS, MUL.PS, C.LT.PS• String
– REP MOVSB (x86)73
![Page 74: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/74.jpg)
Addressing Modes: How to Get Operands from Memory
74
Addressing Mode
Instruction Function
Register Add R4, R3, R2 Regs[R4] <- Regs[R3] + Regs[R2] **
Immediate Add R4, R3, #5 Regs[R4] <- Regs[R3] + 5 **
Displacement Add R4, R3, 100(R1) Regs[R4] <- Regs[R3] + Mem[100 + Regs[R1]]
Register Indirect
Add R4, R3, (R1) Regs[R4] <- Regs[R3] + Mem[Regs[R1]]
Absolute Add R4, R3, (0x475) Regs[R4] <- Regs[R3] + Mem[0x475]
Memory Indirect
Add R4, R3, @(R1) Regs[R4] <- Regs[R3] + Mem[Mem[R1]]
PC relative Add R4, R3, 100(PC) Regs[R4] <- Regs[R3] + Mem[100 + PC]
Scaled Add R4, R3, 100(R1)[R5] Regs[R4] <- Regs[R3] + Mem[100 + Regs[R1] + Regs[R5] * 4]
** May not actually access memory!
![Page 75: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/75.jpg)
Data Types and Sizes • Types
– Binary Integer– Binary Coded Decimal (BCD)– Floating Point
• IEEE 754• Cray Floating Point• Intel Extended Precision (80-bit)
– Packed Vector Data– Addresses
• Width– Binary Integer (8-bit, 16-bit, 32-bit, 64-bit)– Floating Point (32-bit, 40-bit, 64-bit, 80-bit)– Addresses (16-bit, 24-bit, 32-bit, 48-bit, 64-bit)
75
![Page 76: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/76.jpg)
ISA Encoding Fixed Width: Every Instruction has same width • Easy to decode(RISC Architectures: MIPS, PowerPC, SPARC, ARM…)Ex: MIPS, every instruction 4-bytesVariable Length: Instructions can vary in width• Takes less space in memory and caches(CISC Architectures: IBM 360, x86, Motorola 68k, VAX…)Ex: x86, instructions 1-byte up to 17-bytesMostly Fixed or Compressed:• Ex: MIPS16, THUMB (only two formats 2 and 4 bytes)• PowerPC and some VLIWs (Store instructions compressed,
decompress into Instruction Cache(Very) Long Instruction Word: • Multiple instructions in a fixed width bundle• Ex: Multiflow, HP/ST Lx, TI C6000 76
![Page 77: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/77.jpg)
ISA Encoding Fixed Width: Every Instruction has same width • Easy to decode (RISC Architectures: MIPS, PowerPC, SPARC, ARM…) Ex: MIPS, every instruction 4-bytes Variable Length: Instructions can vary in width • Takes less space in memory and caches (CISC Architectures: IBM 360, x86, Motorola 68k, VAX…) Ex: x86, instructions 1-byte up to 17-bytes Mostly Fixed or Compressed: • Ex: MIPS16, THUMB (only two formats 2 and 4 bytes) • PowerPC and some VLIWs (Store instructions compressed,
decompress into Instruction Cache (Very) Long Instruction Word: • Multiple instructions in a fixed width bundle • Ex: Multiflow, HP/ST Lx, TI C6000 77
![Page 78: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/78.jpg)
x86 (IA-32) Instruction Encoding
78
Immediate Displacement Scale, Index, Base ModR/M Opcode Instruction
Prefixes
0,1,2, or 4 bytes
0,1,2, or 4 bytes
1 byte (if needed)
1 byte (if needed)
1,2, or 3 bytes
Up to four Prefixes (1 byte each)
x86 and x86-64 Instruction Formats Possible instructions 1 to 18 bytes long
![Page 79: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/79.jpg)
MIPS64 Instruction Encoding
79 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
![Page 80: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/80.jpg)
Real World Instruction Sets
80
Arch Type # Oper # Mem Data Size # Regs Addr Size Use
Alpha Reg-Reg 3 0 64-bit 32 64-bit Workstation
ARM Reg-Reg 3 0 32/64-bit 16 32/64-bit Cell Phones, Embedded
MIPS Reg-Reg 3 0 32/64-bit 32 32/64-bit Workstation, Embedded
SPARC Reg-Reg 3 0 32/64-bit 24-32 32/64-bit Workstation
TI C6000 Reg-Reg 3 0 32-bit 32 32-bit DSP
IBM 360 Reg-Mem 2 1 32-bit 16 24/31/64 Mainframe
x86 Reg-Mem 2 1 8/16/32/64-bit
4/8/24 16/32/64 Personal Computers
VAX Mem-Mem 3 3 32-bit 16 32-bit Minicomputer
Mot. 6800 Accum. 1 1/2 8-bit 0 16-bit Microcontroler
![Page 81: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/81.jpg)
Why the Diversity in ISAs?
Technology Influenced ISA • Storage is expensive, tight encoding important• Reduced Instruction Set Computer
– Remove instructions until whole computer fits on die• Multicore/Manycore
– Transistors not turning into sequential performanceApplication Influenced ISA • Instructions for Applications
– DSP instructions• Compiler Technology has improved
– SPARC Register Windows no longer needed– Compiler can register allocate effectively
81
![Page 82: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/82.jpg)
Recap
82
Physics
Devices
Circuits Gates
Register-Transfer Level Microarchitecture
Instruction Set Architecture
Operating System/Virtual Machines Programming Language
Algorithm Application
Computer Architecture (EE 660)
![Page 83: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/83.jpg)
Recap
• ISA vs Microarchitecture• ISA Characteristics
– Machine Models– Encoding– Data Types– Instructions– Addressing Modes
83
Physics
Devices
Circuits Gates
Register-Transfer Level Microarchitecture
Instruction Set Architecture
Operating System/Virtual Machines Programming Language
Algorithm Application
![Page 84: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/84.jpg)
Computer Architecture Lecture 1
Next Class: Microcode and Review of Pipelining
84
![Page 85: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/85.jpg)
1
Agenda
• MicrocodedMicroarchitectures• Pipeline Review– Pipelining Basics– Structural Hazards– Data Hazards– Control Hazards
![Page 86: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/86.jpg)
2
Agenda
• MicrocodedMicroarchitectures• Pipeline Review– Pipelining Basics– Structural Hazards– Data Hazards– Control Hazards
![Page 87: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/87.jpg)
3
WhatHappensWhentheProcessorisToo Large?
![Page 88: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/88.jpg)
4
WhatHappensWhentheProcessorisToo Large?
• TimeMultiplex Resources!
![Page 89: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/89.jpg)
Microcontrol Unit MauriceWilkes, 1954
Embed the controllogic state table ina memory array
Matrix A Matrix B
Decoder
Next state
op conditional code flip-flop
µ address
Control lines toALU, MUXs, Registers
Firstusedin EDSAC-2,completed 1958
Memory
5
![Page 90: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/90.jpg)
Microcoded Microarchitecture
Memory (RAM)
µcontroller(ROM)
Datapath
Data Addr
busy? zero?
opcode
enMem MemWrt
holds fixedmicrocode instructions
holds user program written in macrocode
instructions (e.g., x86, MIPS, etc.)
6
![Page 91: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/91.jpg)
ABus-basedDatapathfor RISC
Microinstruction: register to register transfer (17 control signals)
enMem
MA
addr
Memory
data
busy ldMA
MemWrt
Bus 32
A B
OpSel ldA
bcompare? ldB
ALU
2
RegWrt enReg
data
rs1rs2rd32(PC)1(Link)
RegSel
addr 32 GPRs+ PC ...
32-bit Reg
3
rs1rs2rd
ImmSel
IR
Opcode ldIR
Imm ALUExt control
enALUenImm
2
7
![Page 92: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/92.jpg)
8
Agenda
• MicrocodedMicroarchitectures• Pipeline Review– Pipelining Basics– Structural Hazards– Data Hazards– Control Hazards
![Page 93: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/93.jpg)
AnIdeal Pipeline
• Allobjectsgothroughthesame stages• Nosharingofresourcesbetweenanytwo stages• Propagationdelaythroughallpipelinestagesis equal• Schedulingofatransactionenteringthepipelineisnotaffectedbythetransactionsinother stages
stage 1
stage 2
stage3
stage 4
10
![Page 94: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/94.jpg)
AnIdeal Pipeline
• Allobjectsgothroughthesame stages• Nosharingofresourcesbetweenanytwo stages• Propagationdelaythroughallpipelinestagesis equal• Schedulingofatransactionenteringthepipelineisnotaffectedbythetransactionsinother stages
• Theseconditionsgenerallyholdforindustryassemblylines,butinstructionsdependoneachothercausingvarious hazards
stage 1
stage 2
stage3
stage 4
10
![Page 95: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/95.jpg)
UnpipelinedDatapathfor MIPS
clk
WBSrcRegWrite MemWrite
rdataDataMemorywdata
weaddr
RegDst BSrcExtSelOpCode
z
OpSel
0x4
AddAdd
clk
zero?
clk
addrinst
Inst.Memory
PC
wers1rs2
rd1wswd rd2GPRs
ImmExt
ALU
ALUControl
31
PCSrcbrrindjabspc+4
11
![Page 96: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/96.jpg)
SimplifiedUnpipelined Datapath
rdataDataMemorywdata
we addrALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2GPRs
PC
12
![Page 97: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/97.jpg)
Pipelined Datapath
Clockperiodcanbereducedbydividingtheexecutionofaninstructionintomultiple cycles
write-backphase
fetchphase
executephase
decode& register-fetch phase
memoryphase
rdataDataMemorywdata
we addrALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
tC>max{tIM,tRF,tALU,tDM,tRW}(=tDM probably)14
![Page 98: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/98.jpg)
Pipelined Control
write-backfetch
phaseexecutephase
decode& register-fetch phase
memoryphase
we addr
rdataDataMemorywdata
ALU
ImmExt
0x4Add
addrrdata
Inst. Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
phaseClockperiodcanbereducedbydividingtheexecutionof aninstructionintomultiple cycles
tC>max{tIM,tRF,tALU,tDM,tRW}(=tDM probably)14
![Page 99: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/99.jpg)
Pipelined Control
write-backfetch
phaseexecutephase
decode& register-fetch phase
memoryphase
we addr
rdataDataMemorywdata
ALU
ImmExt
0x4Add
addrrdata
Inst. Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
HardwiredController
phaseClockperiodcanbereducedbydividingtheexecutionof aninstructionintomultiple cycles
tC>max{tIM,tRF,tALU,tDM,tRW}(=tDM probably)15
![Page 100: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/100.jpg)
Pipelined Control
Clockperiodcanbereducedbydividingtheexecutionofaninstructionintomultiple cycles
write-backphase
fetchphase
executephase
decode& register-fetch phase
memoryphase
rdataDataMemorywdata
we addrALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
HardwiredController
tC>max{tIM,tRF,tALU,tDM,tRW}(=tDM probably)
However,CPIwillincreaseunlessinstructionsare pipelined 17
![Page 101: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/101.jpg)
Pipelined Control
write-backfetch
phaseexecutephase
decode& register-fetch phase
memoryphase
we addr
rdataDataMemorywdata
ALU
ImmExt
0x4Add
addrrdata
Inst. Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
HardwiredController
phaseClockperiodcanbereducedbydividingtheexecutionof aninstructionintomultiple cycles
tC>max{tIM,tRF,tALU,tDM,tRW}(=tDM probably)
However,CPIwillincreaseunlessinstructionsare pipelined 18
![Page 102: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/102.jpg)
10
“IronLaw”ofProcessor PerformanceTime = Instructions Cycles Time
Program Program * Instruction * Cycle
Microarchitecture CPI cycle timeMicrocoded >1 shortSingle-cycle unpipelined 1 longPipelined 1 short
–Instructionsperprogramdependsonsourcecode,compilertechnology,andISA
–Cyclesperinstructions(CPI)dependsuponthe ISAandthe microarchitecture
–Timepercycledependsuponthemicroarchitectureandthebase technology
![Page 103: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/103.jpg)
20
“IronLaw”ofProcessor PerformanceTime = Instructions Cycles Time
Program Program * Instruction * Cycle–Instructionsperprogramdependsonsourcecode,compilertechnology,andISA
–Cyclesperinstructions(CPI)dependsuponthe ISAandthe microarchitecture
–Timepercycledependsuponthemicroarchitectureandthebase technology
Microarchitecture CPI cycle timeMicrocoded >1 shortSingle-cycle unpipelined 1 longPipelined 1 shortMulti-cycle,unpipelined control
![Page 104: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/104.jpg)
20
“IronLaw”ofProcessor PerformanceTime = Instructions Cycles Time
Program Program * Instruction * Cycle
Microarchitecture CPI cycle timeMicrocoded >1 shortSingle-cycle unpipelined 1 longPipelined 1 shortMulti-cycle,unpipelined control >1 short
–Instructionsperprogramdependsonsourcecode,compilertechnology,andISA
–Cyclesperinstructions(CPI)dependsuponthe ISAandthe microarchitecture
–Timepercycledependsuponthemicroarchitectureandthebase technology
![Page 105: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/105.jpg)
CPI ExamplesTime
7 cycles
Inst 1
5 cycles
Inst 2
10 cycles
Inst 3
Microcodedmachine
Inst 1 Inst 2 Inst 3
3instructions,22cycles, CPI=7.33Unpipelinedmachine
3instructions,3cycles, CPI=1Pipelinedmachine
3instructions,3cycles, CPI=1
21
![Page 106: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/106.jpg)
22
Technology Assumptions
Thus,thefollowingtimingassumptionis reasonable
• Asmallamountofveryfastmemory (caches)backedupbyalarge,slower memory
• FastALU(atleastfor integers)• MultiportedRegisterfiles (slower!)
tIM » tRF » tALU » tDM » tRW
A5-stagepipelinewillbethefocusofourdetailed design
- somecommercialdesignshaveover30 pipelinestagestodoaninteger add!
![Page 107: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/107.jpg)
Pipeline Diagrams
write-backphase
fetchphase
executephase
decode& register-fetch phase
memoryphase
rdataDataMemorywdata
we addrALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
HardwiredController
Weneedsomewaytoshowmultiplesimultaneoustransactionsinbothspaceand time
23
![Page 108: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/108.jpg)
PipelineDiagrams:Transactionsvs. Time
25
write-backphase
fetchphase
executephase
decode& register-fetch phase
memoryphase
we addr
rdataDataMemorywdata
ALU
ImmExt
0x4Add
addrrdata
Inst. Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
HardwiredController
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .instruction1 IF1 ID1 EX1 MA1 WB1instruction2 IF2 ID2 EX2instruction3 IF3 ID3instruction4 IF4instruction5
MA2EX3ID4IF5
WB2MA3 WB3EX4 MA4 WB4ID5 EX5 MA5 WB5
![Page 109: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/109.jpg)
PipelineDiagrams:Spacevs. Time
26
write-backphase
fetchphase
executephase
decode& register-fetch phase
memoryphase
rdataDataMemorywdata
we addrALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2GPRs
IRPC
HardwiredController
t5 t6 t7 . . . .t0 t1 t2 t3 I1 I2 I3 I4
I1 I2 I3I1 I2
I1
time IF ID EX MA WB
t4I5I4I3I2I1
I5I4 I5I3 I4 I5I2 I3 I4 I5
Res
ourc
es
![Page 110: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/110.jpg)
26
InstructionsInteractWithEach Otherin Pipeline
• Structural Hazard: An instruction in thepipeline needs a resource being used byanother instruction in the pipeline
• DataHazard:Aninstructiondependsonadatavalueproducedbyanearlier instruction
• ControlHazard:Whetherornotaninstructionshouldbeexecuteddependsonacontroldecisionmadebyanearlier instruction
![Page 111: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/111.jpg)
27
Agenda
• MicrocodedMicroarchitectures• Pipeline Review– Pipelining Basics– Structural Hazards– Data Hazards– Control Hazards
![Page 112: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/112.jpg)
28
OverviewofStructural Hazards• Structuralhazardsoccurwhentwoinstructions needthesamehardwareresourceatthesame time
• Approachestoresolvingstructural hazards– Schedule:Programmerexplicitlyavoidsschedulinginstructionsthatwouldcreatestructural hazards
– Stall:Hardwareincludescontrollogicthatstalls untilearlierinstructionisnolongerusingcontended resource
– Duplicate:Addmorehardwaretodesignsothateachinstructioncanaccessindependentresourcesatthesametime
• Simple5-stageMIPSpipelinehasnostructural hazardsspecificallybecauseISAwasdesignedthat way
![Page 113: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/113.jpg)
ExampleStructural Hazard:Unified Memory
WBIF EXID MEM
rdataDataMemorywdata
we addrALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2 GPRs
IRPC
30
![Page 114: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/114.jpg)
ExampleStructural Hazard:Unified Memory
WBIF EXID MEM
Unified Memory
rdataDataMemorywdata
we addrALU
Imm Ext
0x4
addrrdata
Inst. Memory
rs2rd1
wswd rd2 GPRs
Addwe
rs1
IRPC
addr
rdatawdata
30
![Page 115: EE 660: Computer Architecture Lecture 1: Introduction and ...](https://reader034.fdocuments.net/reader034/viewer/2022042421/62602c725bb0fa62f228a3e2/html5/thumbnails/115.jpg)
ExampleStructural Hazard:2-Cycle Memory
WBIF EXID M M
adddata
DatMe orywd a
we rramat
E
ALU
ImmExt
0x4Add
addrrdata
Inst.Memory
wers1 rs2
rd1wswd rd2 GPRs
IRPC
M0Stage
31
M1Stage