ECE 552 / CPS 550 Advanced Computer Architecture I Lecture...
Transcript of ECE 552 / CPS 550 Advanced Computer Architecture I Lecture...
![Page 1: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/1.jpg)
ECE 552 / CPS 550Advanced Computer Architecture I
Lecture 2CISC and Microcoding
Benjamin LeeElectrical and Computer Engineering
Duke University
www.duke.edu/~bcl15www.duke.edu/~bcl15/class/class_ece252fall12.html
![Page 2: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/2.jpg)
ECE 552 / CPS 550 2
MicroarchitecturesInstruction Set Architecture (ISA)
- Defines the hardware-software interface- Provides convenient functionality to higher levels (e.g., software)- Provides efficient implementation to lower levels (e.g., hardware)
Microarchitecture- Microarchitecture implements ISA- Implementation abstracted from programmer
Early Microarchitectures- Stack Machines (1960s)- Microprogrammed Machines(1970s-1980s)- Reduced Instruction Set Computing (1990s)
![Page 3: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/3.jpg)
Stack Machines
![Page 4: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/4.jpg)
• Stack operations (e.g., push/pop)• Computation (e.g., +, -, …)
• Data pointer (DP) to access element in data area
• Program counter (PC) to access instruction in code area
• Stack pointer (SP) to access, move element in stack frame
ECE 552 / CPS 550 4
Stack Machine Overview
stackSP
DP
PC
data
.
..abc
push apush bpush c*+push e/
code
Functional Units(+, -, …)
![Page 5: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/5.jpg)
ECE 552 / CPS 550 5
Stack Machine Overview
Processor state includes stack
typical operations:push, pop, +, *, ...
Instructions, like +, implicitly specify top 2 stack elements as operands.
aba
push bè
cba
push cè
ba
popè
![Page 6: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/6.jpg)
abc
(a + b * c) / (a + d * c - e)/
+
* +a e
-
ac
d c
*b
Reverse Polisha b c * + a d c * + e - /
push apush bpush cmultiply
*
Evaluation Stack
b * c
Expression Evaluation
![Page 7: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/7.jpg)
ECE 552 / CPS 550 7
Stack Hardware OrganizationStack as Processor State
- Processor state includes part of stack - Processor state is bounded, small à tens of stack elements in registers- Remainder of stack in main memory
Stacks and Memory References- Option 1: Top 2 stack elements in registers, remainder in memory- Each push/pop requires memory reference; poor performance
- Option 2: Top N stack elements in registers, remainder in memory- Overflows/underflows require memory reference; better performance- Analogous to register spilling
![Page 8: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/8.jpg)
(a+b*c)/(a+d*c-e) è a b c * + a d c * + e - /
ECE 552 / CPS 550 8
Stack Size & Memory References
program stack (size = 2) memory refspush a R0 apush b R0 R1 bpush c R0 R1 R2 c, ss(a)* R0 R1 sf(a)+ R0push a R0 R1 apush d R0 R1 R2 d, ss(a+b*c)push c R0 R1 R2 R3 c, ss(a)* R0 R1 R2 sf(a)+ R0 R1 sf(a+b*c)push e R0 R1 R2 e,ss(a+b*c)- R0 R1 sf(a+b*c)/ R0
• 4 stack stores (ss), 4 stack fetches (sf)• Implicit memory references when program stack exceeds processor capacity
![Page 9: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/9.jpg)
(a+b*c)/(a+d*c-e) è a b c * + a d c * + e - /
ECE 552 / CPS 550 9
Stack Size & Evaluating Expressions
• Note a and c are loaded twice à inefficient register use
program stack (size = 4)push a R0push b R0 R1push c R0 R1 R2* R0 R1+ R0push a R0 R1push d R0 R1 R2push c R0 R1 R2 R3* R0 R1 R2+ R0 R1push e R0 R1 R2- R0 R1/ R0
![Page 10: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/10.jpg)
ECE 552 / CPS 550 10
vs General Purpose Register File
(a+b*c)/(a+d*c-e) è a b c * + a d c * + e - /
• Use registers efficiently with explicit names• Reduce unnecessary loads, stores• Require fewer registers but longer instructions
Load R0 aLoad R1 cLoad R2 bMul R2 R1 Reuse R2, Store result into R2
Add R2 R0 Reuse R2, …Load R3 dMul R3 R1 Reuse R3, Store result into R3Add R3 R0 Reuse R3, …
Load R0 e Reuse R0, …Sub R3 R0 Reuse R3, …Div R2 R3 Reuse R2, …
![Page 11: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/11.jpg)
ECE 552 / CPS 550 11
vs General Purpose Registers- Amdahl, Blaauw, and Brooks. “Architecture of the IBM System/360.” 1964
“In the final analysis, the stack organization would have been about break-even…the general-purpose objective weighed heavily in favor of the more flexible addressed register organization.”
1. Stack machine’s advantage is from fast registers, not how they’re used
2. Surfacing instructions, which bring submerged data to active positions, is profitable 50% of the time due to repeated operands.
3. Code density for stacks, register files is comparable
4. Stack depth limited by number of fast registers; requires stack management
5. Stacks benefit recursive sub-routines but requires independently addressed stacks (SP management)
6. Fitting variable-length fields into fixed-width stack is awkward
![Page 12: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/12.jpg)
ECE 552 / CPS 550 12
Transition from Stack Machines Stack machine’s popularity faded in the 1980s
Code Density• Stack programs are not necessarily smaller• Consider frequent stack surfacing instructions• Consider short register addresses
Advent of Modern Compilers• Stack machines require managing finite capacity• Register allocation improves register use• Early language-directed architectures did not account for compilers
![Page 13: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/13.jpg)
Microprogrammed Machines
![Page 14: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/14.jpg)
ECE 552 / CPS 550 14
Microprogrammed MachinesWhy do we care?
• Illustrate small processors with complex instruction sets (CISC)• Understand part of modern machines (x86, PowerPC, IBM360)
ISA favors particular microarchitecture style• Complex Instruction Set Computer (CISC) à microcoded• Reduced Instruction Set Computer (RISC) à hardwired, pipelined• Very Long Instruction Word (VLIW) à fixed latency, in-order pipelines• Java Virtual Machine (JVM) à software interpretation
ISA can use any microarchitecture style• Core2 Duo: CISC (x86) with hardwired pipeline and microcode• This Lecture: RISC (MIPS) with microcode
![Page 15: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/15.jpg)
ECE 552 / CPS 550 15
Hardware Organization
• Structure à How are components connected? Statically• Behavior à How does data move between components? Dynamically
controller
datapath
controllinesstatus
lines
![Page 16: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/16.jpg)
ECE 552 / CPS 550 16
Microcoded Microarchitecture
Memory(RAM)
Datapath
µcontroller(ROM)
AddrData
zero?busy?
opcode
enMemMemWrt
holds fixedmicrocode instructions
holds user written macrocode instructions (e.g., MIPS, x86, etc.)
![Page 17: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/17.jpg)
ECE 552 / CPS 550 17
MIPS32 ISASee Hennessy and Patterson, Appendix for full description
Processor State• 32 32-bit GPRs, R0 always contains a 0• 16 double-precision, 32 single-precision FPRs• FP status register, used for FP compares & exceptions• PC, the program counter• Other special registers
Data Types• 8-bit byte, 16-bit half word • 32-bit word for integers• 32-bit word for single-precision floating-point• 64-bit word for double-precision floating-point
Load/Store ISA- Data addressing modes: immediate & indexed- Branch addressing modes: PC relative & register indirect- Byte addressable memory: big-endian mode
Instructions are 32 bits
![Page 18: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/18.jpg)
ECE 552 / CPS 550 18
MIPS Instruction Formats
6 5 5 16
6 26
6 5 5 16
opcode rs rt immediate (rt) ß (rs) op immediate
6 5 5 5 5 6ALU
ALUi
6 5 5 16Mem
0 rs rt rd 0 func (rd) ß (rs) func (rt)
opcode rs rt displacement M[(rs) + displacement]
opcode rs offset BEQZ, BNEZ
opcode rs JR, JALR
opcode offset J, JAL
![Page 19: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/19.jpg)
ECE 552 / CPS 550 19
Bus-Based MIPS Datapath
Microinstruction specifies register to register transfer (17 control signals)PC à MA means RegSel=PC; enReg=yes; ldMA=yesReg[rt] à B means RegSel=rt; enReg=yes; ldB=yes
enMem
MA
addr
data
ldMA
Memory
busy
MemWrt
Bus (32b)
zero?
A B
OpSel ldA ldB
ALU
enALU
ALUCtrl
2
RegWrt
enReg
addr
data
rsrtrd32(PC)31(Link)
RegSel
Registers32 GPRsPC
3
rsrtrd
ExtSel
IR
Opcode
ldIR
ImmExt
enImm
2
![Page 20: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/20.jpg)
ECE 552 / CPS 550 20
Executing Instructions1. Fetch instruction from memory @ PC
2. Decode register
3. Perform ALU operation
4. Access memory (optional)
5. Write register (optional)+ compute next PC
![Page 21: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/21.jpg)
ECE 552 / CPS 550 21
Macroinstruction à Microprograminstr fetch: MA ¬ PC # fetch current instr
A ¬ PC # next PC calculationIR ¬ MemoryPC ¬ A + 4dispatch on Opcode # start microcode
ALU: A ¬ Reg[rs]B ¬ Reg[rt]Reg[rd] ¬ func(A,B)do instruction fetch
ALUi: A ¬ Reg[rs]B ¬ Imm # sign extensionReg[rt] ¬ Opcode(A,B)do instruction fetch
![Page 22: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/22.jpg)
ECE 552 / CPS 550 22
Macroinstruction à MicroprogramLW: A ß Reg[rs] # compute address
B ß ImmMA ß A + BReg[rt] ß Memory # load from memorydo instruction fetch
J: A ß PCB ß IRPC ß JumpTarg(A,B) #JumpTarg(A,B) = do instruction fetch {A[31:28],B[25:0],00}
beqz: A ß Reg[rs]If zero?(A) then go to bz-takendo instruction fetch
bz-taken: A ß PCB ß Imm << 2PC ß A + Bdo instruction fetch
![Page 23: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/23.jpg)
State Op zero? busy Control points next-state
fetch0 * * * MA ¬ PC fetch1fetch1 * * yes .... fetch1fetch1 * * no IR ¬ Memory fetch2fetch2 * * * A ¬ PC fetch3fetch3 * * * PC ¬ A + 4 ?
ALU0 * * * A ¬ Reg[rs] ALU1ALU1 * * * B ¬ Reg[rt] ALU2ALU2 * * * Reg[rd] ¬ func(A,B) fetch0
fetch3 ALU * * PC ¬ A + 4 ALU0
Microprogram in the ROM
![Page 24: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/24.jpg)
State Op zero? busy Control points next-state
fetch0 * * * MA ¬ PC fetch1fetch1 * * yes .... fetch1fetch1 * * no IR ¬ Memory fetch2fetch2 * * * A ¬ PC fetch3fetch3 ALU * * PC ¬ A + 4 ALU0fetch3 ALUi * * PC ¬ A + 4 ALUi0fetch3 LW * * PC ¬ A + 4 LW0fetch3 SW * * PC ¬ A + 4 SW0fetch3 J * * PC ¬ A + 4 J0fetch3 JAL * * PC ¬ A + 4 JAL0fetch3 JR * * PC ¬ A + 4 JR0fetch3 JALR * * PC ¬ A + 4 JALR0fetch3 beqz * * PC ¬ A + 4 beqz0...
ALU0 * * * A ¬ Reg[rs] ALU1ALU1 * * * B ¬ Reg[rt] ALU2ALU2 * * * Reg[rd] ¬ func(A,B) fetch0
Microprogram in the ROM
![Page 25: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/25.jpg)
Microprogram in the ROMState Op zero? busy Control points next-state
ALUi0 * * * A ¬ Reg[rs] ALUi1ALUi1 sExt * * B ¬ sExt16(Imm) ALUi2ALUi1 uExt * * B ¬ uExt16(Imm) ALUi2ALUi2 * * * Reg[rd]¬ Op(A,B) fetch0...J0 * * * A ¬ PC J1J1 * * * B ¬ IR J2J2 * * * PC ¬ JumpTarg(A,B) fetch0...
beqz0 * * * A ¬ Reg[rs] beqz1beqz1 * yes * A ¬ PC beqz2beqz1 * no * .... fetch0beqz2 * * * B ¬ sExt16(Imm) beqz3beqz3 * * * PC ¬ A+B fetch0...
JumpTarg(A,B) = {A[31:28],B[25:0],00}
![Page 26: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/26.jpg)
size = 2(w+s) x (c + s)w = 6 (opcode) + 2 (status), c = 17 (signals), s = ?
steps per opcode = 4 to 6 + fetch-sequencestates = (4 steps per op-group ) x op-groups + shared microprograms
= 4 x 8 + 10 = 42 à s = 6
Control ROM = 2(8+6) x 23 bits = 48 Kbytes
Control ROM
data
status & opcode
addr
next µ PC
Control signals
µ PC/w
/ s
/ c
Size of Control Store
![Page 27: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/27.jpg)
ECE 552 / CPS 550 27
Reducing Size of Control StoreControl store must be fast, which is expensive
Reduce ROM height (= address bits)• reduce inputs by extra external logic• each input bit doubles the size of the control store
• reduce states by grouping opcodes • find common sequences of actions
• condense input status bits• combine all exceptions into one, i.e., exception/no-exception
Reduce ROM width• restrict the next-state encoding• next, dispatch on opcode, wait for memory, ...• encode control signals (vertical microcode)
![Page 28: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/28.jpg)
MIPS Microcontroller v2
uJumpType =next | spin
| fetch | dispatch| feqz | fnez
Control Signals (17)
Control ROM
address
data
+1
Opcode ext
uPC (state)
jumplogic
zero
uPC uPC+1
absolute
op-group
busy
uPCSrcinput encoding reduces
ROM height
28ECE 552 / CPS 550
![Page 29: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/29.jpg)
ECE 552 / CPS 550 29
Jump LogicµPCSrc = Case µJumpTypes
next à µPC+1
spin à if (busy) then µPC else µPC+1
fetch à absolute
dispatch à op-group
feqz à if (zero) then absolute else µPC+1
fnez à if (zero) then µPC+1 else absolute
![Page 30: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/30.jpg)
ECE 552 / CPS 550 30
Instruction Fetch and ALUState Control Points Next Statefetch0 MA ß PC nextfetch1 IR ß Memory spinfetch2 A ß PC nextfetch3 PC ß A+4 dispatch…ALU0 A ß Reg[rs] nextALU1 B ß Reg[rt] nextALU2 Reg[rd] ß func(A,B) fetch…ALUi0 A ß Reg[rs] nextALUi1 B ß sExt16(Imm) nextALUi2 Reg[rd] ß Op(A,B) fetch
![Page 31: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/31.jpg)
ECE 552 / CPS 550 31
Load and StoreState Control Points Next StateLW0 A ß Reg[rs] nextLW1 B ß sExt16(Imm) nextLW2 MA ß A+B nextLW3 Reg[rt] ß Memory spinLW4 fetch…SW0 A ß Reg[rs] nextSW1 B ß sExt16(Imm) nextSW2 MA ß A+B nextSW3 Memory ß Reg[rt] spinSW4 fetch
![Page 32: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/32.jpg)
JumpsState Control Points Next StateJ0 A ß PC nextJ1 B ß IR nextJ2 PC ß JumpTarg(A,B) fetch
JR0 A ß Reg[rs] nextJR1 PC ß A fetch
JAL0 A ß PC nextJAL1 Reg[31] ß A nextJAL2 B ß IR nextJAL3 PC ß JumpTarg(A,B) fetch
JALR0 A ß PC nextJALR1 B ß Reg[rs] nextJALR2 Reg[31] ß A nextJALR3 PC ß B fetch
ECE 552 / CPS 550 32
![Page 33: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/33.jpg)
Complex Instructions
ExtSel
A B
RegWrtenReg
enMem
MA
addr addr
data data
rsrtrd32(PC)31(Link)
RegSel
OpSel ldA ldB ldMA
Memory32 GPRs+ PC ...
32-bit RegALU
enALU
Bus
IR
busyzero?Opcode
ldIR
ImmExt
enImm
2ALU
control
2
3
MemWrt
32
rsrtrd
rd ßM[(rs)] op (rt) Reg-Memory-src ALU op M[(rd)] ß (rs) op (rt) Reg-Memory-dst ALU op M[(rd)] ß M[(rs)] op M[(rt)] Mem-Mem ALU op
![Page 34: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/34.jpg)
Mem-Mem ALU InstructionM[(rd)] ß M[(rs)] op M[(rt)]
State Control Points Next StateALUMM0 MA ß Reg[rs] nextALUMM1 A ß Memory spinALUMM2 MA ß Reg[rt] nextALUMM3 B ß Memory spinALUMM4 MA ß Reg[rd] nextALUMM5 Memory ß func(A,B) spinALUMM6 fetch
• With microcode, complex instructions do not change datapath and only require space for microprogram
• With hardwired control, complex instructions change datapath
ECE 552 / CPS 550 34
![Page 35: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/35.jpg)
PerformanceMicrocode requires multiple cycles per instruction
• tC > max(treg-reg, tALU, t µROM)
Microcode requires multiple cycles per instruction• Suppose t µROM < 0.1 * tRAM• Compare against single-cycle, hardwired control• Good performance even when CPI=10
Microprogramming Advantages (1970’s)• ROMs were much faster than DRAMs• Datapath, control were simpler for complex instruction sets• Fixing bugs in the controller was easy• ISA compatibility across models was easy
ECE 552 / CPS 550 35
![Page 36: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/36.jpg)
Decline of MicroprogrammingIncreasingly complex microcode
• Complex instruction sets led to subroutine, call stacks in microcode• Fixing bugs difficult with read-only nature of ROMs
Evolving Technology• VLSI made RAMs faster• Microarchitectural innovations (pipelining, caches, buffers) made
multi-cycle, register-to-register execution unattractive
Evolving Software• Better compilers made complex instructions less important• Most complex instructions are rarely used
Transition to RISC- Build fast instruction cache- Use software subroutines, not hardware subroutines- Use simple ISA to enable hardwired implementations
36ECE 552 / CPS 550
![Page 37: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/37.jpg)
Modern MicroprogrammingMicroprogramming is far from extinct
Important role in early microprocessors (1980s)• Examples: Intel 386, 486
Assisting role in modern microprocessors• Most instructions executed directly (hardwired control)• Infrequent, complicated instructions invoke microcode engine• Examples: AMD Athlon, Intel Core 2 Duo, IBM Power PC
• Patchable microcode common for post-fabrication bug fixes• Example: Intel Pentiums load microcode patches at bootup
ECE 552 / CPS 550 37
![Page 38: ECE 552 / CPS 550 Advanced Computer Architecture I Lecture ...people.duke.edu/~bcl15/teachdir/ece552_fall16/ece... · ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 2](https://reader031.fdocuments.net/reader031/viewer/2022021712/5b6d2be27f8b9a962a8c5e4d/html5/thumbnails/38.jpg)
ECE 552 / CPS 550 38
AcknowledgementsThese slides contain material developed and copyright by - Arvind (MIT)- Krste Asanovic (MIT/UCB)- Joel Emer (Intel/MIT)- James Hoe (CMU)- John Kubiatowicz (UCB)- Alvin Lebeck (Duke)- David Patterson (UCB)- Daniel Sorin (Duke)