Instruction Set Architecture
description
Transcript of Instruction Set Architecture
April 21, 2023204521 Digital System Architecture
Instruction Set Architecture
Pradondet Nilagupta
Spring 2001
(original notes from Prof. Mike Schulte )
April 21, 2023204521 Digital System Architecture 2
Overview ISA (1/2)
Concentrate on ISA
Introduce wide variety of design alternative to instruction set architecture– Focus on four topics
• Classification of instruction set alternative– Give some qualitative assessment of the advantage and
disadvantage of various approach
• Present and analyze some instruction set measurement that are largely independent of a specific instruction
April 21, 2023204521 Digital System Architecture 3
Overview ISA (2/3)
• Address the issue of a languages and compiler and their bearing on ISA
• Show how these idea are reflected in DLX instruction set, which is typical of recent instruction set architectures
Examine a wide variety of architectural measurement– Measurements depend on the programs
measured and on the compiler used in making these measurements
April 21, 2023204521 Digital System Architecture 4
Hot Topics in Computer Architecture
1950s and 1960s:– Computer Arithmetic
1970 and 1980s: – Instruction Set Design– ISA Appropriate for Compilers
1990s: – Design of CPU– Design of memory system– Design of I/O system– Multiprocessors– Instruction Set Extensions
April 21, 2023204521 Digital System Architecture 5
Instruction Set Architecture
Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine.
The instruction set architecture is also the machine description that a hardware designer must understand to design a correct implementation of the computer.
April 21, 2023204521 Digital System Architecture 6
Instruction Set Architecture
The instruction set architecture serves as the interface between software and hardware
instruction set
software
hardware
April 21, 2023204521 Digital System Architecture 7
Interface Design
A good interface:– Lasts through many implementations
(portability, compatibility)
– Is used in many different ways (generality)
– Provides convenient functionality to higher levels
– Permits an efficient implementation at lower levels
April 21, 2023204521 Digital System Architecture 8
What Are the Components of an ISA? (1/2)
Sometimes known as The Programmer’s Model of the machineStorage cells– General and special purpose registers in the CPU– Many general purpose cells of same size in memory– Storage associated with I/O devices
The machine instruction set– The instruction set is the entire repertoire of machine op
erations– Makes use of storage cells, formats, and results of the f
etch/execute cycle– i.e., register transfers
April 21, 2023204521 Digital System Architecture 9
What Are the Components of an ISA? (2/2)
The instruction format– Size and meaning of fields within the
instruction
The nature of the fetch-execute cycle– Things that are done before the
operation code is known
April 21, 2023204521 Digital System Architecture 10
Programmer’s Models of Various Machines
216 bytes of main memorycapacity
Fewer than 100
instructions
7
15
A
216 – 1
B
IX
SP
PC
0
12 generalpurposeregisters
More than 300instructions
More than 250instructions
More than 120instructions
232 – 1
252 – 1
0
PSW
Status
R0
PC
R11
AP
FP
SP
0 31 0
32 64-bit
floating pointregisters
(introduced 1993)(introduced 1981)(introduced 1975) (introduced 1979)
0
31
0 63
32 32-bitgeneral purposeregisters
0
31
0 31
More than 50 32-bit special
purposeregisters
0 31
252 bytes of main mem orycapacity
0
M6800 VAX11 PPC601
220 – 1
AX
BX
CX
DX
SP
BP
SI
DI
15 7 08
IP
Status
Addressand
countregisters
CS
DS
SS
ES
M emorysegm entregisters
220 bytes of main memorycapacity
0
I8086
232 bytes of main mem orycapacity
Dataregisters
6 specialpurposeregisters
April 21, 2023204521 Digital System Architecture 11
What Must an Instruction Specify?(1/2)
Which operation to perform– add r0, r1, r3
– Ans: Op code: add, load, branch, etc.
Where to find the operand or operands– add r0, r1, r3
– In CPU registers, memory cells, I/O locations, or part of instruction
Place to store result– add r0, r1, r3
– Again CPU register or memory cell
April 21, 2023204521 Digital System Architecture 12
Location of next instructionadd r0, r1, r3
br endloop– Almost always memory cell pointed to by prog
ram counter—PC
Instruction Format (encoding)– How is it decoded?
Sometimes there is no operand, or no result, or no next instruction. Can you think of examples?
What Must an Instruction Specify?(2/2)
April 21, 2023204521 Digital System Architecture 13
Instructions Can Be Divided into Classes (1/2)
Data movement instructions– Move data from a memory location or register t
o another memory location or register without changing its form
– Load—source is memory and destination is register
– Store—source is register and destination is memory
Arithmetic and logic (ALU) instructions– Change the form of one or more operands to p
roduce a result stored in another location– Add, Sub, Shift, etc.
April 21, 2023204521 Digital System Architecture 14
Instructions Can Be Divided into 3 Classes (2/2)
Branch instructions (control flow instructions)– Alter the normal flow of control from ex
ecuting the next instruction in sequence– Br Loc, Brz Loc2,—unconditional or con
ditional branches
April 21, 2023204521 Digital System Architecture 15
Examples of Data Movement Instructions
Instruction Meaning Machine
MOV A, B Move 16 bits from memory location A to VAX11 Location B
LDA A, Addr Load accumulator A with the byte at memory M6800 location Addr
lwz R3, A Move 32-bit data from memory location A to PPC601 register R3
li $3, 455 Load the 32-bit integer 455 into register $3 MIPS R3000
mov R4, dout Move 16-bit data from R4 to output port dout DEC PDP11
IN, AL, KBD Load a byte from in port KBD to accumulator Intel Pentium
LEA.L (A0), A2 Load the address pointed to by A0 into A2 M6800
April 21, 2023204521 Digital System Architecture 16
Examples of ALUInstructions
Instruction Meaning Machine
MULF A, B, Cmultiply the 32-bit floating point values at VAX11mem loc’ns. A and B, store at C
nabs r3, r1 Store abs value of r1 in r3 PPC601
ori $2, $1, 255 Store logical OR of reg $ 1 with 255 into reg $2MIPS R3000
DEC R2 Decrement the 16-bit value stored in reg R2DEC PDP11
SHL AX, 4 Shift the 16-bit value in reg AX left by 4 bit pos’ns.Intel 8086
• Notice again the complete dissimilarity of both syntax and semantics.
April 21, 2023204521 Digital System Architecture 17
Examples of Branch Instructions
Instruction Meaning Machine
BLSS A, Tgt Branch to address Tgt if the least significant VAX11bit of mem loc’n. A is set (i.e. = 1)
bun r2 Branch to location in R2 if result of previous PPC601floating point computation was Not a Number (NAN)
beq $2, $1, 32 Branch to location (PC + 4 + 32) if contentsMIPS R3000
of $1 and $2 are equal
SOB R4, Loop Decrement R4 and branch to Loop if R4 0DEC PDP11
JCXZ Addr Jump to Addr if contents of register CX 0. Intel 8086
April 21, 2023204521 Digital System Architecture 18
ISA Metrics
Orthogonality– No special registers, few special cases, all operand
modes available with any data type or instruction type
Completeness– Support for a wide range of operations and target
applications
Regularity– No overloading for the meanings of instruction fields
Streamlined– Resource needs easily determined
Ease of compilation (programming?), Ease of implementation, Scalability
April 21, 2023204521 Digital System Architecture 19
Instruction Set Design Issues (1/2)
Instruction set design issues include:– Where are operands stored?
• registers, memory, stack, accumulator
– How many explicit operands are there? • 0, 1, 2, or 3
– How is the operand location specified?• register, immediate, indirect, . . .
– What type & size of operands are supported?• byte, int, float, double, string, vector. . .
April 21, 2023204521 Digital System Architecture 20
Instruction Set Design Issues (2/2)
– What operations are supported? • add, sub, mul, move, compare . . .
– How to encode them into instruction format?
• Instructions should be multiples of Bytes.
April 21, 2023204521 Digital System Architecture 21
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from Implementation
High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,88000,IBM RS6000, . . .1987+)
April 21, 2023204521 Digital System Architecture 22
Evolution of Instruction Sets
Major advances in computer architecture are typically associated with landmark instruction set designs– Ex: Stack VS. GPR (System 360)
Design decisions must take into account:– technology– machine organization– programming languages– compiler technology– operating systems
The design decisions in turn influence these.
April 21, 2023204521 Digital System Architecture 23
Classifying ISAs
Accumulator (before 1960):1 address add A acc acc + mem[A]
Stack (1960s to 1970s):0 address add tos tos + next
Memory-Memory (1970s to 1980s):2 address add A, B mem[A] mem[A] + mem[B]3 address add A, B, C mem[A] mem[B] + mem[C]
Register-Memory (1970s to present): 2 address add R1, A R1 R1 + mem[A]
load R1, A R1 mem[A]
Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 R2 + R3
load R1, R2 R1 mem[R2]store R1, R2 mem[R1] R2
April 21, 2023204521 Digital System Architecture 24
Comparison of ISA Classes
Code Sequence for C = A+BStack Accumulator Register
(register-Mem)Register(load/store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
Memory efficiency? Instruction access? Data access?
April 21, 2023204521 Digital System Architecture 25
Comparison of ISA Classes
Memory efficiency? Instruction access? Data access?
Stack Accumulator Register(register-Mem)
Register(load/store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
April 21, 2023204521 Digital System Architecture 26
Ex. Expression Evaluation for 3-, 2-, 1-, and 0-Address Machines
Number of instructions & number of addresses both varyDiscuss as examples: size of code in each case
3 - a d d r e s s 2 - a d d r e s s 1 - a d d r e s s S t a c k
add a, b, cmpy a, a, dsub a, a, e
load a, badd a, cmpy a, dsub a, e
load badd cmpy dsub estore a
push bpush caddpush dmpypush esubpop a
Evaluat e a = (b+c) *d - e
April 21, 2023204521 Digital System Architecture 27
Stack Architectures
Instruction set: add, sub, mult, div, . . .push A, pop A
Example: A*B - (A+C*B)push Apush Bmulpush Apush Cpush Bmuladdsub
A BA
A*BA*B
A*BA*B
AAC
A*BA A*B
A C B B*C A+B*C result
April 21, 2023204521 Digital System Architecture 28
The 0-Address, or Stack, Machine and Instruction Format
Memory
Op1Addr:
TOS
SOS
etc.
Op1
Programcounter
NextiAddr: Nexti
Bits:
Format
Format
8 24
CPU
Where to findnext instruction
Stack
24
push Op1 (TOS ฌ Op1)
Instruction formats
add (TOS ฌ TOS + SOS)
push Op1Addr
Operation
Bits: 8
add
Which operation
Result
W here to find operands, and where to put result
(on the stack)
April 21, 2023204521 Digital System Architecture 29
Stacks: Pros and Cons
Pros– Good code density (implicite top of stack)– Low hardware requirements– Easy to write a simpler compiler for stack
architectures
Cons– Stack becomes the bottleneck– Little ability for parallelism or pipelining– Data is not always at the top of stack when need,
so additional instructions like TOP and SWAP are needed
– Difficult to write an optimizing compiler for stack architectures
April 21, 2023204521 Digital System Architecture 30
Accumulator Architectures
Instruction Setadd A, sub A, mult A, div A, . . .
load A, store A
Example: A*B-(A+C*B)load B
mul C
add A
store D
load A
mul B
sub D
B B*C A+B*C AA+B*C A*B result
April 21, 2023204521 Digital System Architecture 31
1-Address Machine and Instruction Format
Special CPU register, the accumulator, supplies 1 operand and stores result
One memory address used for other operand
Need instructions to load and store operands:LDA OpAddrSTA OpAddr
Memory
Op1Addr: Op1
NextiProgramcounter
Accumulator
NextiAddr:
CPU
Where to findnext instruction
24
add Op1 (Acc ฌ Acc + Op1)
Bits: 8 24
Instruction format
add Op1Addr
Whichoperation
Where to find operand1
Where to find operand2, and
where to put result
April 21, 2023204521 Digital System Architecture 32
Accumulators: Pros and Cons
Pros– Very low hardware requirements
– Easy to design and understand
Cons– Accumulator becomes the bottleneck
– Little ability for parallelism or pipelining
– High memory traffic
April 21, 2023204521 Digital System Architecture 33
Memory-Memory Architectures
Instruction set:(3 operands) add A, B, C sub A, B, C mul A, B, C
(2 operands) add A, B sub A, B mul A, B
Example: A*B - (A+C*B)– 3 operands 2 operands
mul D, A, B mov D, A
mul E, C, B mul D, B
add E, A, E mov E, C
sub E, D, E mul E, B
add E, A
sub E, D
April 21, 2023204521 Digital System Architecture 34
The 2-Address Machine and Instruction Format
Result overwrites Operand 2Needs only 2 addresses in instruction but less choice in placing data
M em ory
O p1Addr:
O p2Addr:
Op1
Programcounter
Op2,Res
N extiNextiAddr:
CPU
W here to findnext instruction
24
add Op2, Op1 (O p2 ฌ O p2 + Op1)
B its: 8 24 24
Instruction format
add Op2Addr Op1Addr
W hichoperation
W here toput result
W here to find operands
April 21, 2023204521 Digital System Architecture 35
Memory - Memory:Pros and Cons
Pros– Requires fewer instructions (especially if 3
operands)
– Easy to write compilers for (especially if 3 operands)
Cons– Very high memory traffic (especially if 3
operands)
– Variable number of clocks per instruction
– With two operands, more data movements are required
April 21, 2023204521 Digital System Architecture 36
Register-Memory Architectures
Instruction Set:add R1, A sub R1, A mul R1, B
load R1, A store R1, A
Example: A*B - (A+C*B)
mul R1, B /* A*B */
store R1, D
load R2, C
mul R2, B /* C*B */
add R2, A /* A + CB */
sub R2, D /* AB - (A + C*B) */
April 21, 2023204521 Digital System Architecture 37
Memory-Register: Pros and Cons
Pros– Some data can be accessed without loading
first
– Instruction format easy to encode
– Good code density
Cons– Operands are not equivalent (poor
orthorganality)
– Variable number of clocks per instruction
– May limit number of registers
April 21, 2023204521 Digital System Architecture 38
Load-Store Architectures
Instruction Set:add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3load R1, R4 store R1, R4
Example: A*B - (A+C*B)load R2, &Bload R3, &Cload R4, R1load R5, R2load R6, R3mul R7, R6, R5 /* C*B */add R8, R7, R4 /* A + C*B */mul R9, R4, R5 /* A*B */sub R10, R9, R8 /* A*B - (A+C*B) */
April 21, 2023204521 Digital System Architecture 39
The 3-Address Machine and Instruction format
Address of next instruction kept in processor state register—the PC (except for explicit branches/jumps)Rest of addresses in instruction– Discuss: savings in instruction word size
add, Res, Op1, Op2 (Res ฌ Op2 + Op1)
Op1Addr:
Op2Addr:
Op1
Programcounter
Op2
ResAddr:
NextiAddr:
Res
Nexti
Where to findnext instruction
24Bits: 8 24 24
Instruction format
24
add ResAddr Op1Addr Op2Addr
Whichoperation
Where toput result Where to find operands
Memory CPU
April 21, 2023204521 Digital System Architecture 40
Load-Store: Pros and Cons
Pros– Simple, fixed length instruction encoding
– Instructions take similar number of cycles
– Relatively easy to pipeline
Cons– Higher instruction count
– Not all instructions need three operands
– Dependent on good compiler
April 21, 2023204521 Digital System Architecture 41
Registers:Advantages and Disadvantages
Advantages– Faster than cache (no addressing mode or tags)
– Deterministic (no misses)
– Can replicate (multiple read ports)
– Short identifier (typically 3 to 8 bits)
– Reduce memory traffic
Disadvantages– Need to save and restore on procedure calls and contex
t switch
– Can’t take the address of a register (for pointers)
– Fixed size (can’t store strings or structures efficiently)
– Compiler must manage
April 21, 2023204521 Digital System Architecture 42
General Register Machine and Instruction Formats
Memory
Op1Addr: Op1load
Nexti Programcounter
load R8, Op1 (R8 ฌ Op1)
CPU
Registers
R8
R6
R4
R2
Instruction formats
R8load Op1Addr
add R2, R4, R6 (R2 ฌ R4 + R6)
R2add R6R4
April 21, 2023204521 Digital System Architecture 43
General Register Machine and Instruction Formats
It is the most common choice in today’s general-purpose computersWhich register is specified by small “address” (3 to 6 bits for 8 to 64 registers)Load and store have one long & one short address: 1- addressesArithmetic instruction has 3 “half” addresses
April 21, 2023204521 Digital System Architecture 44
Real Machines Are Not So Simple
Most real machines have a mixture of 3, 2, 1, 0, and 1- address instructions
A distinction can be made on whether arithmetic instructions use data from memory
If ALU instructions only use registers for operands and result, machine type is load-store– Only load and store instructions reference memory
Other machines have a mix of register-memory and memory-memory instructions
April 21, 2023204521 Digital System Architecture 45
Byte Ordering
Idea– Bytes in long word numbered 0 to 3– Which is most (least) significant?– Can cause problems when exchanging binary data
between machines
Big Endian: Byte 0 is most, 3 is least– IBM 360/370, Motorola 68K, Sparc.
Little Endian: Byte 0 is least, 3 is most– Intel x86, VAX
Alpha– Chip can be configured to operate either way– DEC workstation are little endian– Cray T3E Alpha’s are big endian
April 21, 2023204521 Digital System Architecture 46
Byte Ordering Example (1/2)
union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } dw;
c[3]
s[1]
i[0]
c[2]c[1]
s[0]
c[0] c[7]
s[3]
i[1]
c[6]c[5]
s[2]
c[4]
l[0]
April 21, 2023204521 Digital System Architecture 47
Byte Ordering Example (2/2)
int j;for (j = 0; j < 8; j++)dw.c[j] = 0xf0 + j;printf("Characters 0-7 == [0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n", dw.c[0], dw.c[1], dw.c[2], dw.c[3], dw.c[4], dw.c[5], dw.c[6], dw.c[7]);printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n", dw.s[0], dw.s[1], dw.s[2], dw.s[3]);printf("Ints 0-1 == [0x%x,0x%x]\n", dw.i[0], dw.i[1]);printf("Long 0 == [0x%lx]\n", dw.l[0]);
April 21, 2023204521 Digital System Architecture 48
Byte Ordering on Alpha
Little Endian
c[3]
s[1]
i[0]
LSB MSB
c[2]c[1]
s[0]
c[0]
LSB MSB
LSB MSB
c[7]
s[3]
i[1]
LSB MSB
c[6]c[5]
s[2]
c[4]
LSB MSB
LSB MSB
f0 f1 f2 f3 f4 f5 f6 f7
Output on Alpha:Print
l[0]
LSB MSB
April 21, 2023204521 Digital System Architecture 49
Byte Ordering on x86
Little Endian
c[3]
s[1]
i[0]
LSB MSB
c[2]c[1]
s[0]
c[0]
LSB MSB
LSB MSB
c[7]
s[3]
i[1]
LSB MSB
c[6]c[5]
s[2]
c[4]
LSB MSB
LSB MSB
f0 f1 f2 f3 f4 f5 f6 f7
Output on Pentium:Print
l[0]
LSB MSB
April 21, 2023204521 Digital System Architecture 50
Byte Ordering on Sun
Big Endian
c[3]
s[1]
i[0]
LSBMSB
c[2]c[1]
s[0]
c[0]
MSB LSB
LSB MSB
c[7]
s[3]
i[1]
LSB MSB
c[6]c[5]
s[2]
c[4]
MSB LSB
LSB MSB
f0 f1 f2 f3 f4 f5 f6 f7
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]Long 0 == [0xf0f1f2f3]
Output on Sun:Print
l[0]
MSB LSB
April 21, 2023204521 Digital System Architecture 51
Big Endian Addressing
With Big Endian addressing, the byte binary address
x . . . x00
is in the most significant position (big end) of a 32 bit word (IBM, Motorola, Sun, HP).
MSB LSB0 1 2 34 5 6 7
April 21, 2023204521 Digital System Architecture 52
Little Endian Addressing
With Little Endian addressing, the byte binary address
x . . . x00
is in the least significant position (little end) of a 32 bit word (DEC, Intel).
MSB LSB3 2 1 07 6 5 4
April 21, 2023204521 Digital System Architecture 53
Operand Alignment
An access to an operand of size s bytes at byte address A is said to be aligned if A mod s = 0
40 41 42 43 44D0 D1 D2 D3
D0 D1 D2 D3
April 21, 2023204521 Digital System Architecture 54
Unrestricted Alignment
If the architecture does not restrict memory accesses to be aligned then– Software is simple
– Hardware must detect misalignment and make 2 memory accesses
– Expensive detection logic is required
– All references can be made slower
Sometimes unrestricted alignment is required for backwards compatibility
April 21, 2023204521 Digital System Architecture 55
Restricted Alignment
If If the architecture restricts memory accesses to be aligned then– Software must guarantee alignment
– Hardware detects misalignment access and traps
– No extra time is spent when data is aligned
Since we want to make the common case fast, having restricted alignment is often a better choice, unless compatibility is an issue.
April 21, 2023204521 Digital System Architecture 56
Addressing Modes (1/3)ImmediateAdd R4, #3Regs[R4] Regs[R4]+3
Operand:3
RegisterAdd R4, R3Regs[R4] Regs[R4]+Regs[R3]
R3
Operand
Registers
Register IndirectAdd R4, (R1)Regs[R4] Regs[R4]+Mem[Regs[R1]]
R1
Operand
Registers Memory
April 21, 2023204521 Digital System Architecture 57
Addressing Modes(2/3)
DirectAdd R4, (1001)Regs[R4] Regs[R4]+Mem[1001]
1001
Operand
Memory
Memory IndirectAdd R4, @(R3)Regs[R4] Regs[R4]+Mem[Mem[Regs[R3]]]
R3
Operand
Registers Memory
April 21, 2023204521 Digital System Architecture 58
Addressing Modes(3/3)
DisplacementAdd R4, 100(R1)Regs[R4] Regs[R4]+Mem[100+R1]
Registers
R1 100
Memory
Operand
ScaledAdd R1, 100(R2) [R3]Regs[R1] Regs[R1]+Mem[100+ Regs[R2]+Regs[R3]*d]
Registers
R2 100
Memory
Operand
R3
*d
April 21, 2023204521 Digital System Architecture 59
Addressing Mode Usage
3 Programs from SPEC89 on VAX– Others : 0.
1%
0%
24%
43%
32%
6%
16%
3%
17%
55%
1%
6%
11%
39%
40%
0% 20% 40% 60%
Me
mo
ryIn
dir
ec
tIm
me
dia
te
Frequency of addressing mode
gcc
spice
Tex
April 21, 2023204521 Digital System Architecture 60
Displacement Address Size
Average of 5 programs from SPECint92 and SPECfp92.
– X-axis is log2 of displacement.
– 1% of addresses > 16 bits.
0%
5%
10%
15%
20%
25%
30%
0 2 4 6 8 10 12 14
Number of Bits
April 21, 2023204521 Digital System Architecture 61
Immediate Addressing Mode (1/2)
10 Programs from SPECInt92 and SPECfp92
10%
87%
58%
35%
45%
77%
78%
10%
0% 50% 100%LoadsCom
pares
ALUAll
Inst
.
Percentage of operations using immediate
FP
Integer
April 21, 2023204521 Digital System Architecture 62
Immediate Addressing Mode (2/2)
50% to 60% fit within 8 bits
75% to 80% fit within 16 bits
0%
10%
20%
30%
40%
50%
60%
0 4 8 12 16 20 24 28 32
Number of Bits
April 21, 2023204521 Digital System Architecture 63
Addressing Mode Summary
Important data addressing modes– Displacement– Immediate– Register Indirect
Displacement size should be 12 to 16 bits.
Immediate size should be 8 to 16 bits.
April 21, 2023204521 Digital System Architecture 64
Instruction Operations
Arithmetic and Logical:– add, subtract, and , or, etc.
Data transfer:– Load, Store, etc.
Control– Jump, branch, call, return, trap, etc.
Synchronization:– Test & Set.
String:– string move, compare, search.
April 21, 2023204521 Digital System Architecture 65
Top-9 x86 Instructions
Simple Instructions dominates instruction frequency.
1 Load 22%2 Conditional branch 20%3 Compare 16%4 Store 12%5 Add 8%6 And 6%
7 Sub 5%8 Move register-register 4%9 Call 1%
April 21, 2023204521 Digital System Architecture 66
Methods of Testing Condition
Condition code: Status bits are set by ALU operations.– Add r1, r2, r3 and bz label– Extra status bits
Condition register:– cmp r1, r2, r3 and bgt r1, label– Simple, but use up a register
Compare and branch– bgt r1, r2, label– One instruction– Too much work per instruction
April 21, 2023204521 Digital System Architecture 67
Conditional Branch Distance
Short displacement fields often sufficient for branch
0%5%
10%15%
20%25%
30%35%
40%
0 2 4 6 8 10 12 14
Bits of Branch Displacement
April 21, 2023204521 Digital System Architecture 68
Conditional Branch Addressing
PC-relative, since most branches from current PC address– At least 8 bits.
Compare Equal/Not Equal most important for integer programs.
7%
7%
87%
40%
23%
37%
0% 50% 100%
LT/GE
GT/LE
EQ/NEQ
Frequency of comparison types
FP
Integer
April 21, 2023204521 Digital System Architecture 69
Data Types and Usage
Byte, half word (16 bits), word (32 bits), double word (64 bits).
Arithmetic:
– Decimal: 4bit per digit.
– Integers: 2’s complement
– Floating-point: IEEE standard-- single, double, extended precision.
7%
19%
74%
0%
0%
0%
31%
69%
0% 20% 40% 60% 80%
Byte
Half Word
Word
DoubleWord
Frequency of comparison types
FPInteger
April 21, 2023204521 Digital System Architecture 70
Instruction Format
Fixed– Operation, address specifier 1, address specifier 2, address specifier
3.– MIPS, SPARC, Power PC.
Variable– Operation & # of operands, address specifier1, …, specifier n.– VAX
Hybrid– Intel x86– operation, address specifier, address field.– Operation, address specifier 1, address specifier 2, address field.– Operation, address field, address specifier 1, address specifier 2.
Summary:– If code size is most important, use variable format.– If performance is most important, use fixed format.
April 21, 2023204521 Digital System Architecture 71
Types of Addressing Modes (VAX)
Memory
1. Register direct Ri
2. Immediate (literal) #n
3. Displacement M[Ri + #n]
4. Register indirect M[Ri]
5. Indexed M[Ri + Rj]
6. Direct (absolute) M[#n]
7. Memory Indirect M[M[Ri] ]
8. Autoincrement M[Ri++]
9. Autodecrement M[Ri - -]
10. Scaled M[Ri + Rj*d + #n]
April 21, 2023204521 Digital System Architecture 72
Frequency of Immediate Addressing on DLX
Not all instructions can take advantage of immediate addressing.
Operation SPECint92 SPECfp92Loads 10% 45%
Compares 87% 77%ALU ops 58% 78%
Overall 35% 10%
April 21, 2023204521 Digital System Architecture 73
Types of Operations
Arithmetic and Logic: AND, ADD
Data Transfer: MOVE, LOAD, STORE
Control BRANCH, JUMP, CALL
System OS CALL, VM
Floating PointADDF, MULF, DIVF
Decimal ADDD, CONVERT
String MOVE, COMPARE
Graphics (DE)COMPRESS
April 21, 2023204521 Digital System Architecture 74
80x86 Instruction Frequency
Rank Instruction Frequency1 load 22%2 branch 20%3 compare 16%4 store 12%5 add 8%6 and 6%7 sub 5%8 register move 4%
9
9 call 1%10 return 1%
Total 96%
April 21, 2023204521 Digital System Architecture 75
Relative Frequency of Control Instructions
Design hardware to handle branches quickly, since these occur most frequently
Operation SPECint92 SPECfp92Call/Return 13% 11%
Jumps 6% 4%Branches 81% 87%
April 21, 2023204521 Digital System Architecture 76
Frequency of Operand Sizeson 32-bit Load-Store Machine
For floating-point want good performance for 64 bit operands.
For integer operations want good performance for 32 bit operands.
Size SPECint92 SPECfp9264 bits 0% 69%32 bits 74% 31%16 bits 19% 0%
8 bits 19% 0%
April 21, 2023204521 Digital System Architecture 77
Encoding an Instruction set
a desire to have as many registers and addressing mode as possiblethe impact of size of register and addressing mode fields on the average instruction size and hence on the average program sizea desire to have instruction encode into lengths that will be easy to handle in the implementation
April 21, 2023204521 Digital System Architecture 78
Three choice for encoding the instruction set
Variable– Instruction length varies based on opcode and address
specifiers– For example, VAX instructions vary between 1 and 53
bytes– Good code density, but difficult to decode
Fixed– Only a single size for all instructions– For example, DLX, MIPS, Power PC, Sparc all have 32 bit
instructions– Not as good code density, but easier to decode
Hybrid– Have multiple format lengths specified by the opcode– For example, IBM 360/370 and Intel 80x86– Compromise between code density and ease of decode
April 21, 2023204521 Digital System Architecture 79
Compilers and ISA
Compiler Goals– All correct programs compile correctly– Most compiled programs execute quickly– Most programs compile quickly– Achieve small code size– Provide debugging support
Multiple Source Compilers– Same compiler can compiler different languages
Multiple Target Compilers– Same compiler can generate code for different
machines
April 21, 2023204521 Digital System Architecture 80
Compilers Phases
Compilers use phases to manage complexity (fig.2.18)– Front end
• Convert language to intermediate form
– High level optimizer• Procedure inlining and loop transformations
– Global optimizer• Global and local optimization, plus register
allocation
– Code generator (and assembler)• Dependency elimination, instruction selection,
pipeline scheduling
April 21, 2023204521 Digital System Architecture 81
The impact of Compiler Technology on The architect’s decision
How are variables allocated and addressed?
How many registers are needed to allocated variables appropriately?
April 21, 2023204521 Digital System Architecture 82
Allocation of Variables
Stack – used to allocate local variables– grown and shrunk on procedure calls and returns– register allocation works best for stack-allocated objects
Global data area– used to allocate global variables and constants– many of these objects are arrays or large data structures– impossible to allocate to registers if they are aliased
Heap– used to allocate dynamic objects– heap objects are accessed with pointers– never allocated to registers
April 21, 2023204521 Digital System Architecture 83
Designing ISA to Improve Compilation
Provide enough general purpose registers to ease register allocation ( more than 16). Provide regular instruction sets by keeping the operations, data types, and addressing modes orthogonal.Provide primitive constructs rather than trying to map to a high-level language.Simplify trade-off among alternatives. Allow compilers to help make the common case fast.
April 21, 2023204521 Digital System Architecture 84
Summary: ISA
Use general purpose registers with a load-store architecture. Support these addressing modes: displacement, immediate, register indirect.Support these simple instructions: load, store, add, subtract, move register, shift, compare equal, compare not equal, branch, jump, call, return.Support these data size: 8-,16-,32-bit integer, IEEE FP standard.Provide at least 16 general purpose registers plus separate FP registers and aim for a minimal instruction set.