Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
-
Upload
hsien-hsin-lee -
Category
Devices & Hardware
-
view
428 -
download
2
Transcript of Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
![Page 1: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/1.jpg)
ECE 4100/6100Advanced Computer Architecture
Lecture 4 ISA Taxonomy
Prof. Hsien-Hsin Sean LeeSchool of Electrical and Computer EngineeringGeorgia Institute of Technology
![Page 2: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/2.jpg)
2
Instruction Set Architecture• Specification of a microprocessor design• Interface between user and machine’s functionality • Good instruction set design principles
– Compatibility– Implementability– Programmability– Usability– Encoding efficiency
![Page 3: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/3.jpg)
3
Main ISA Design Philosophy • CISC (Complex Instruction Set Computer)
• RISC (Reduced Instruction Set Computer)
• VLIW (Very Long Instruction Word)
• EPIC (Explicitly Parallel Instruction Computer)
![Page 4: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/4.jpg)
4
CISC• Complex Instruction Set Computers
• Close “semantic gap” between programming and execution– Smaller code size (memory was expensive!)– Simplify compilation
• Another state machine (controlled by microcode) inside the machine
• Example: x86, Intel 432, IBM 360, DEC VAX
![Page 5: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/5.jpg)
5
CISC Example: x86• MOVSD ;; move a double word, 1-byte instruction
MOVSD // m32[DS:EDI] = m32[DS:ESI]
• REP;; 1-byte prefix to repeat string operations
REP MOVSD // count set up in ECX
LOCK ADD ds:[esi+ecx*2+0x67452301], 0xEFCDAB89 // 13-byte
F0 3E 81 84 4E 01 23 45 67 89 AB CD EF [--][--]+disp32 ESI+ECX*2
prefix
![Page 6: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/6.jpg)
6
RISC • Observation made by IBM (John Cocke, Eckert-
Mauchly Award’85, Turing Award’87, Nat’l Medal of Technology’91, Nat’l Medal of Science’94)– Few of the available instructions are used
• CISC : “n+1” phenomenon – Adding an instruction requiring an extra level of
decoding logic can slow down the entire ISA
• Reduced Instruction Set Computer – Originated at IBM in 1975, a telephone project
• To achieve 12 MIPS (300 calls per sec, 20k inst per call)• Simple instructions
– IBM 801 in 1978– More compiler effort to gain performance
![Page 7: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/7.jpg)
7
A Typical RISC• Smaller number of instructions• Fixed format instruction (e.g., 32 bits)• 3-address, reg-to-reg arithmetic instructions• Single cycle operation for execution • Load-store architecture• Simple address modes
– Base + displacement– No indirection
• Simple branch conditions • Hardwired control (No microcode)• More compiler effort• Examples:
– RISC I and RISC II at Berkeley– MIPS (Microprocessors without Interlocked Pipe Stage) at
Stanford– IBM RISC Technology, Sun Sparc, HP PA-RISC, ARM
![Page 8: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/8.jpg)
8
RISC Example: MIPS
Op
31 26 01516202125
Rs Rt immediate
Op
31 26 025
Op
31 26 01516202125
Rs Rt
target
Rd Funct
R-format (Register-Register)561011
I-format (Register-Immediate)
Op
31 26 01516202125
Rs Rt immediate
I-format (Branch)
J-format (Jump / Call)
Shamt
Op
31 26 01516202125
Base Dest immediate
I-format (Load/Store)
add $1, $2, $3
addi $1, $2, -5
lw $1, 24($9)
beq L1, $4, $0
j L2
![Page 9: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/9.jpg)
9
CISC vs. RISCCISC RISC
Variable length instructions Fixed-length instructions, single-cycle operation
Abundant instructions and addressing modes
Fewer instructions and addressing modes
Long, complex decoding Simple decoding
Contain mem-to-mem operations Load/store architecture
Use microcode No microinstructions, directly decoded and executed by HW logic
Closer semantic gap (shift complexity to microcode)
Needs smart compilers, or intelligent hardware to reorder instructions
IBM 360, DEC VAX, x86, Moto 68030 IBM 801, MIPS, RISC I, IBM POWER, Sun Sparc
• Some definitions were from the paper by Colwell et al. in 1985
![Page 10: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/10.jpg)
10
CISC vs. RISC (Reality)
IBM 370/168
VAX 11/780
Xerox Dorado
IBM 801
Berkeley RISC1
Stanford MIPS
Year introduced
1973 1978 1978 1980 1981 1983
# instructions
208 303 270 120 39 55
Microcode 54KB 61KB 17KB 0 0 0
Instruction size
2 to 6 B 2 to 57 B 1 to 3 B 4B 4B 4B
Execution model
Reg-regReg-memMem-mem
Reg-regReg-memMem-me
m
Stack Reg-reg Reg-reg Reg-reg
CISC RISC
![Page 11: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/11.jpg)
11
Observation and Controversy• ”Instruction Set and Beyond: Computers, Complexity and Controversy” by Bob Colwell (Eckert-Mauchly Award, 2005) and gang from CMU, also see response from RISC camp: Patterson (Eckert-Mauchly Award, 2008) and Hennessy (Eckert-Mauchly Award, 2001)• CISC/RISC classification should *not* be a dichotomy• Case in point: MicroVAX-32 by DEC, a single chip implementation
– Subsetting VAX instructions (but still, 175 instructions!)– Emulate complex instructions – a RISC or a CISC? (Well, it has variable length instructions, not a ld/st machine, with a microcode control, have all VAX addressing mode)
• Effective processor design = CISC experiences + RISC tenets• RISC features are not incompatible or mutually exclusive
– Large register file (w/ register windows) • RISC/CISC issues are best considered in light of their function-to-implementation level assignment
![Page 12: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/12.jpg)
12
Modern X86 Machine Design • CISC outfit• RISC inside• E.g., Intel P6/Netburst/Core, AMD Athlon/Phenom/Opteron• Each x86 instruction is decoded into “micro-op” (op) or “RISC-op” on-the-fly• Internal microarchitecture resembles RISC design philosophy• Processor dynamically schedules “ops”• Compiler’s scheduling is still beneficial
![Page 13: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/13.jpg)
13
Recent ISA Design Trend• Look at this instruction in MIPS (CISC or RISC?)
CABS.LE.PS $fcc0, $f8, $f10 ;; |y||w| , |x||w|?• Many complex instructions emerged for new apps
– Viterbi instruction for wireless communication/DSP– Sum of absolute differences in SSE (PSAD) or other DSP: C = |A-B| for MPEG (motion estimation)
• In embedded domain, code size is critical • Reducing programming efforts• Optimizing performance via
– Specialized hardware (accelerator-based)– Co-processor (controlled by main processor)– ISA plug-in (flexible)
![Page 14: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/14.jpg)
14
VLIW • Very Long Instruction Word
– Originated from microcode compaction– Coined by Josh Fisher (Eckert-Mauchly Award, 2003)
• Compiler will – Perform instruction scheduling (latency-aware) – Pack several independent instructions into a VLIW instruction
• Issues – Compatibility– Many nop’s– Very complex compiler
• Information unavailable at static compile time• interprocedural optimization is difficult)
Pioneers• Culler Scientific
– Led by Prof. Glen J. Culler (National Medal of Technology winner 2000, Berkeley Prof. David Culler’s father)
• Multiflow (Fisher)– Led by Josh Fisher (Eckert-Mauchly Award 2003), John O’Donnell, John Ruttenberg, David Papworth,
Bob Colwell (Eckert-Mauchly Award 2005), Geoffery Lowney, etc.– Several Multiflow TRACE were delivered
• Cydrome (Rau, Yen’s) in the 80’s– Led by Bob Rau (Eckert-Mauchly Award 2002), David Yen, Wei Yen, etc.– Had a working prototype
Modern Processors• Most DSP embrace VLIW (e.g., TI C6x, StarCore, ADI TigerSHARC, etc.)• Transmeta Crusoe (internal, never released ISA)
![Page 15: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/15.jpg)
15
Intel/HP EPIC• Explicitly Parallel Instruction Computer• A kin breed of VLIW (e.g., compiler holding the key
to high performance)• Some new features
– Stop bits to address compatibility– ISA enabling data speculation and control speculation
(minimum hardware support needed)– Fully predicated ISA– Rotating registers, RSE (not so new, e.g., MRS in RISC I)
• Lots of ideas from Polycyclic architecture (TRW) and Cydrome by the late Bob Rau (Eckert-Mauchly Award, 2002)ld4 r43=[r38] add r38=16,r38 br.call.sptk b0=printf# ;;
An Itanium Instruction Bundle
![Page 16: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/16.jpg)
16
VLIW Tradeoffs• Plentiful registers, simple encodings, …
• Potentially lower # of transistors than other designs– Reduced speculation, OoO not needed– Size efficiencies, price, power consumption– Is this true for Itanium?
• Drawbacks– Backward compatibility or upgradeability– Due to exposed implementation details
• VLIW is orthogonal to other techniques– Pipeline, SMT, and CMP/Multi-core can be built on top of
processors including VLIW
![Page 17: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/17.jpg)
17
Design Philosophy: VLIW vs. SuperscalarStatic _VOID_DEFUN(_mor_nu), struct _reent *ptr _AND register size_t{ . . .
Static _VOID_DEFUN(_mor_nu), struct _reent *ptr _AND register size_t{ . . .
SameNormal
Source code
IM1 = I–1IM2 = I–2IM3 = I–3T1 = LOAD .T3 = 2*T1..
NormalCompiler
RISCObject code
Scheduling andOperation
Independence:Recognizing
hardware
Normal compiler plus scheduling
and operationIndependence:Recognizing
software
Run-time
Compile Time
The same ILPHardware inBoth cases
![Page 18: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/18.jpg)
18
Design Philosophy: VLIW vs. Superscalar• VLIW
– Requiring less hardware and lower power– Programs need to be changed to run
correctly when even small changes (not always though)
• Superscalar– Object-code compatible
• Sequential programs can be presented to different superscalar implementation of the same ISA
![Page 19: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/19.jpg)
19
Design Philosophy: VLIW vs. Superscalar
![Page 20: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/20.jpg)
20
Superscalar or VLIW?• Reality: the current world is dominated by
…– X86: Core (quad-issue) & ATOM (dual-issue)– And ARM (Cortex A8 is a dual-issue; A9 has
OOO)
• VLIW is largely embraced by the DSP camp
![Page 21: Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA](https://reader036.fdocuments.net/reader036/viewer/2022062306/58efb0ae1a28ab9c4f8b45c5/html5/thumbnails/21.jpg)
21
Should we continue to teach this Chapter about ISA?
Should we continue to teach this Chapter about ISA?