isa

Instruction Set Architectures

Based on slides from Jones & Bartlett

COMP 228, Fall 2009 Instruction Set Architectures 1

Chapter 5 Objectives

• Understand the factors involved in instruction set architecture design.

• Gain familiarity with the diversity of memory addressing modes.

• Understand the concepts of instruction-level pipelining (and parallelism) and its affect upon execution performance.


5.1 Introduction

• This chapter builds upon the ideas in Chapter 4.

• We present a detailed look at different instruction formats, operand types, and memory access methods.

• We will see the interrelation between machine organization and instruction formats.

• This leads to a deeper understanding of computer architecture in general.


5.2 Instruction Formats

Instruction sets are differentiated by the following:• Number of bits per instruction.• Stack-based or register-based.• Number of explicit operands per instruction.• Operand location.• Types of operations.• Type and size of operands.



Instruction set architectures are measured according to:

• Main memory space occupied by a program.

• Instruction complexity.

• Instruction length (in bits).

• Total number of instructions in the instruction set.



In designing/using an instruction set, consideration is given to these variations of features:

• Instruction length (#bits per instruction).– Fixed vs variable length.– Variable length allows using shorter encoding of more

frequent instructions.• Number of operands: typically 0,1,2,3.• Number of addressable registers: RISC allows a

larger register file.• Memory organization.

– Byte- or Word-addressed.• Addressing modes.

– Direct, indirect, indexed, or combinations of these.



• The next consideration for architecture design concerns how the CPU will store data.

• Three choices:1. A stack architecture: instructions operate on data

stored in a stack; 0-address machine.2. A register architecture: instructions operate on data in

registers/memory: 1-address, 2-address, 3-address machines.In choosing one over the other, the tradeoffs include: simplicity (and cost) of hardware design, execution speed (performance) and ease of use (programmability).



• In a stack architecture, instructions and operands are implicitly taken from the stack.– A stack is a storage where data is pushed onto and popped from

the top (of the stack).• In an accumulator architecture (1-address), one of the

operands of a binary operation is implicitly in the accumulator.– The other operand is in memory, creating lots of bus traffic.– It also limits the number of data temporally stored in the

processor.• In a general purpose register (GPR) architecture (2- or

3- address machine), the operands are explicitly specified.– It is faster than accumulator architecture (data needed can be

completely local).


Stack Arithmetic

SP (Top of Stack Pointer)

Memory Memory

Memory

(after push x)

SP x

Memory

y

(after push y)

SP

(after add)

SP x + y

:::: ::::

::::x

::::



• Most systems today are GPR systems.• There are three types:

– Memory-memory where two or three (2 source and 1 destination) operands may be in memory.

– Register-memory where at least one operand must be in a register.

– Load-store where no operands may be in memory. This is typically found in RISC architecture.

• The number of operands and the number of available registers has a direct affect on instruction length (space cost) and execution time (time cost).



• Stack machines use one - and zero-operand instructions.

• LOAD (Push) and STORE (POP) instructions require a single memory address operand.

• Other instructions use operands from the stack implicitly.

• PUSH and POP operations involve only the stack’s top element.

• Binary instructions (e.g., ADD, MULT) use the top two items on the stack, each involving 2 pop and 1 push operations in implementation.



• Stack architectures require us to think about arithmetic expressions a little differently.

• We are accustomed to writing expressions using infix notation, such as: Z = X + Y.

• Stack arithmetic corresponds to using the postfixnotation: Z = XY+.– This is also called reverse Polish notation, (somewhat) in

honor of its Polish inventor, Jan Lukasiewicz (1878 -1956).



• The principal advantage of postfix notation is that parentheses are not used, and has an encoding structure that mimics the corresponding stack operations.

• For example, the infix expression, Z = (X × Y) + (W × U),

becomes: Z = X Y × W U × +

in postfix notation.



• In a stack ISA, the postfix expression, Z = X Y × W U × +

immediately translates into the following code:PUSH XPUSH YMULTPUSH WPUSH UMULTADDPOP Z

Note: MULT involves popping the stack twice and pushing the result back onto the stack



• In comparing with a one-address ISA, like MARIE, the infix expression,

Z = X × Y + W × Uthe translated code becomes:

LOAD XMULT YSTORE TEMPLOAD WMULT UADD TEMPSTORE Z



• In a two-address ISA, (e.g.,Intel, Motorola), the infix expression,

Z = X × Y + W × Uis translated into:

LOAD R1,XMULT R1,YLOAD R2,WMULT R2,UADD R1,R2STORE Z,R1

Note: One-address ISAs often require one operand to be a register.



• With a three-address ISA, (e.g.,mainframes), the infix expression,

Z = X × Y + W × Uis translated into:

MULT R1,X,YMULT R2,W,UADD Z,R1,R2

Would this program execute faster than the corresponding (longer) program that we saw in the stack-based ISA?



• Example: A system has 16 registers and 4K of memory.

• We need 4 bits to access one of the registers. We also need 12 bits for a memory address.

• If the system is to have 16-bit instructions, we have two choices for our instructions:


Clicker 5.1

• In comparing stack, 1-address, 2-address and 3-address machines, which of these will likely lead to programs with fewer instructions?

• A: Stack• B: 1-address• C: 2-address• D: 3-address


Clicker 5.2

• Continuing with the previous comparison, which of them will likely lead to programs involving the most memory operand accesses (read or write)?

• A: Stack• B: 1-address• C: 2-address• D: 3-address


Clicker 5.3

• Which of these architectures is easier to use (program)?

• A: stack• B: 1-address• C: 2-address• D: 3-address


5.4 Addressing

• Immediate addressing is where the data is part of the instruction: the operand value is already specified in the instruction.

• Direct addressing is where the address of the data is specified in the instruction.

• Register addressing is where the data is located in a register specified in the instruction.

• Indirect addressing is where the address of the address of the data is specified in the instruction.

• Register indirect addressing uses a register to store the address of the address of the data.


5.4 Addressing

• Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in the operand to determine the effective address of the data.

• Based addressing is similar except that a base register is used instead of an index register.

• The difference between these two is that an index register holds an offset relative to the address given in the instruction, and a base register holds a base address where the address field specified in the instruction represents a displacement from this base.


5.4 Addressing

• For the instruction shown, what is the value loaded into the accumulator for each addressing mode?


5.4 Addressing

• These are the values loaded into the accumulator for each addressing mode.


5.5 Instruction-Level Pipelining

• Some CPUs divide the fetch-decode-execute cycle into smaller steps.

• These smaller steps for a set of independent instructions can be executed in parallel to increase throughput; this is called ILP (instruction level parallelism).

• Parallel execution can be implemented using the assembly line principle: called pipelining. Pipelining has been the key driving force in improving the performing of processors for over 2 decades.

The next slide shows an example of instruction-level pipelining.


Clicker 5.4

Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from Loadx 100? (indexed)

A: 100B: 200C: 300D: None of the above


Clicker 5.5

Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from Loadi 100? (immediate)



Clicker 5.6

Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from LoadI 100? (indirect)




• Suppose a fetch-decode-execute cycle were broken into the following smaller steps:

• Suppose we have a six-stage pipeline. S1 fetches the instruction, S2 decodes it, S3 determines the address of the operands, S4 fetches them, S5 executes the instruction, and S6 stores the result.

1. Fetch instruction. 4. Fetch operands.2. Decode opcode. 5. Execute instruction.3. Calculate effective 6. Store result.

address of operands.



• For every clock cycle, one small step is carried out, and the stages are overlapped.

S1. Fetch instruction. S4. Fetch operands.S2. Decode opcode. S5. Execute.S3. Calculate effective S6. Store result.

address of operands.



• The theoretical speedup offered by a pipeline can be determined as follows:

Let tp be the time per stage. Each instruction represents a task, T, in the pipeline.The first task (instruction) requires k × tp time to complete in a k-stage pipeline. The remaining (n - 1) tasks emerge from the pipeline one per cycle. So the total time to complete the remaining tasks is (n - 1)tp.Thus, to complete n tasks using a k-stage pipeline requires:

(k × tp) + (n - 1)tp = (k + n - 1)tp.



• If we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find:

• If we take the limit as n approaches infinity, (k + n - 1) approaches n, which results in a theoretical speedup of:



• Our neat equations take a number of things for granted.

• First, we have to assume that the architecture supports fetching instructions and data in parallel.

• Second, we assume that the pipeline can be kept filled at all times. This is not always the case. Pipeline hazards arise that cause pipeline conflicts and stalls.



• An instruction pipeline may stall, or be flushed for any of the following reasons:

– Resource conflicts: instructions may need the same component (such as bus/ALU).– Data dependencies: the next (or a future) instruction depends on the result of the current instruction.– Conditional branching (control dependency): a jump instruction depends on the result of the current instruction.

• Measures can be taken at the software level as well as at the hardware level to reduce the effects of these hazards, but they cannot be totally eliminated.


Clicker 5.7

Suppose a pipelined processor involves 10 segments. What is the maximum speedup that can be attained due to pipelining?

A: 1B: 5C: 10D: > 10


5.6 Real-World Examples of ISAs

• We return briefly to the Intel and MIPS architectures from the last chapter, using some of the ideas introduced in this chapter.

• Intel introduced pipelining to their processor line with its Pentium chip.

• The first Pentium had two five-stage pipelines. Each subsequent Pentium processor had a longer pipeline than its predecessor with the Pentium IV having a 24-stage pipeline.

• The Itanium (IA-64) has only a 10-stage pipeline. [What implications do you deduce from this?]


5.6 Real-World Examples of ISAs

• Intel processors support a wide array of addressing modes.

• The original 8086 provided 17 ways to address memory, most of them variants on the methods presented in this chapter.

• Owing to their need for backward compatibility, the Pentium chips also support these 17 addressing modes.

• The Itanium, having a RISC core, supports only one: register indirect addressing with optional post increment.


Addendum: Pentium ISP

• Programmable Registersax = ah:al and similarly for the restEIP = PC (in MARIE)

081631EAXEDXECXEBXEBPESIEDIESP

AXDXCXBXBPSIDISP

AH ALDHCHBH

DLCLBL

Eight 32- bit General Registers

EFLAGSEIP

Two Special Purpose Registers


Addendum Pentium ISP

• The eight general registers can be used to buffer program variables so that more often used data are locally stored in theprocessor, under the control of the programmer (as coded).

• We can interpret these as 8 accumulators, instead of only 1 in MARIE. This improves programmability and performance (due to locality of program data).

• Technology can support hundred times this amount of registers. The challenge if the programmer/compiler can manage them well in programming.

• The esp register plays a special role: it points to the top of the stack used by the program. Notice a stack exists even though the Pentium is not a stack-computer.



• Notable details– Two-operand instructions (CISC)– Memory-mapped I/O; data movement and

arithmetic instructions can affect I/O page in memory

– Stack-based procedure call and return (instead of JNS used in MARIE)



• Operand is specified by the following addressing modes, used individually or in combination:

• Base:– {eax, ecx, edx, ebx, ebp, esi, edi, esp}

• Index * scale:– {eax, ecx, edx, ebx, ebp, esi, edi} * {1, 2, 4, 8}– A scale factor of 1/2/4/8 allows traversing data stored

as 1/2/4/8 bytes in consecutive memory locations.• Displacement:

– 0 or 8 or 32 bit displacement directly specified.


Addendem Pentium ISP

• Examples:– [ebx+esi+4]: Data is at memory address given by ebx +

esi + 4; this is an example of base-indexed addressing with displacement. For example, ebx can be the starting address of a list of characters, esi is the index to the ith character of the list, and 4 is the 4th character after that.

– ebx: Without the bracket (enclosing the register ebx), the operand is in the register ebx itself. Hence whenever [ ... ] appears, it refers to a memory operand, otherwise it refers to a register operand.

– ebx + esi + 4 is not a valid operand: it is neither in memory nor in a register. The assembler will flag this as syntax error.



• Data Movement Instruction– mov ax, bx

• This instruction moves the 2-byte data from register bx to register ax

– mov ax, [ebx]• The source operand is from memory whose address is

specified in register ebx.• This instruction moves the 2-byte data from that address

specified by ebx to register ax.• This is similar to the Load instruction in MARIE, except

that it is more flexible as the destination register can be specified explicitly.



– mov [ebx], ax• This does the reverse move (i.e., Store) from register ax

to memory location specified by ebx– mov [ebx], eax

• This differs from the previous only in the size of the operand; this time it moves 4 bytes as the source register eax is 4 bytes.

– mov ah, ‘A’• This moves a 1-byte (immediate) data into register ah;

the 1-byte data is the ASCII code of letter A (i.e., 4116).


Clicker 5.8

Suppose M[100] = 200, ebx = 100, eax = 200.What is the result of executing mov eax, [ebx]?

A: eax = 200B: ebx = 200C: eax = 100D: ebx = 100


Clicker 5.9

Suppose ebx = 1000 (hex), eax = 20000(hex).What is the result of executing mov ax, bx?

A: eax = 1000 (hex)B: eax = 21000 (hex)C: eax = 30000 (hex)D: None of the above


Clicker 5.10

Suppose eax = 12345678(hex).What is the result in executing mov ax, ’30’?

A: eax = 30B: eax = 3330C: eax = 12345630D: eax = 12343330


Addendeum Pentium ISP• segment .data ;data segment• msg db ‘Hello World! PleaseType Your ID’,0xA• len equ $ - msg ; length of message • segment .bss ;uninitialized data • id resb 10 ; reserve 10 bytes for ID• segment .text ; code segment• global _start ; global program name • _start: ; program entry• mov eax, 4 ; select kernel call #4 • mov ebx, 1 ; default output device• mov ecx, msg ; pointer to message • mov edx, len ; third argument: length• int 0x80 ; invoke kernel to write• mov eax, 3 ; select kernel call #3 • mov ebx, 0 ; default input device• mov ecx, id ; pointer to id• int 0x80 ; invoke kernel to read• mov eax, 4 ; select kernel call #4 • mov ebx, 1• int 0x80 ; echo id read• exit: mov eax, 1 ; select system call #1• int 0x80 ; invoke kernel call


Clicker 5.11

Count the number of distinct immediate operands in the previous program.

A: 4B: 5C: 6D: 7E: >7



• Arithmetic/Logic Instructions– add eax, ebx

• add the content of register ebx to the content of register eax.

– Before (in hex): eax = 12345678 ebx = 11111111– After (in hex): eax = 23456789 ebx = 11111111

– sub eax, [ebx]• Subtract the content of memory location specified by

ebx from the content of register eax– Before (in hex): eax = 12345678 [ebx] = 11111111– After (in hex): eax = 01234567



• mul bx– dx:ax = ax * bx– This instruction assumes ax as a default

operand.– Multiplying 2-byte with 2-byte creates a 4-byte

result to be stored in the register pair dx:ax• Before (in hex): ax = FFFF bx = 0010• After (in hex): dx = 000F ax = FFF0


Clicker 5.12

Suppose eax = 1234, ebx = 1010, ecx = 2000.What is the result of executing sub bx, ax?

A: ebx = FEDCB: ebx = FFFFFEDCC: ebx = 2044D: None of the above


Clicker 5.13

Suppose eax = 1010, ebx = 101, edx = 0 (all in hex). What is the result of executing mul bl?

A: eax = 1010B: edx = 102020C: eax = 102020D: None of the above



• Control Flow Instructions– jmp X: jump to instruction at address X. – jz X: jump to X if the Z flag = 1 (previous

arithmetic/logic result is zero).– jg X: jump to X if the previous arithmetic/logic result is

greater than zero.– jle X: jump to X if the previous arithmetic/logic result is

less than or equal to zero. – jne X: jump to X if the previous arithmetic/logic result is

not equal to zero.– Among others.


Clicker 5.14

Suppose <S,C,O,Z> = <1,0,1,0>.What is the result of executing jz x?

A: the jump will succeed (i.e. eip becomes x)B: the jump will not succeed


Addendum Pentium ISP• segment .data• v1 db 1,2,3,4,5,6• v2 db 5,6,7,8,9,0• segment .bss• dotprod resw 1• segment .text• global _asm_main• _asm_main:• mov ecx, 6 ; initialize count to 6• sub esi, esi ; equivalent to mov esi, 0• sub dx, dx ; initialize dx (result) to 0• cont: mov al, [esi+v1] ;based index addressing to vector• mov bl, [esi+v2]• imul bl ; ax = al * bl• add dx, ax ; update the dot product in dx• add esi, 1 ; increment esi by 1• sub ecx, 1 ; decrement ecx by 1• jnz cont ; if ecx ≠ 0, jump to cont• mov [dotprod], dx ; update dot product in memory• exit: mov eax, 1 ; return control to kernel• int 0x80


Addendum Pentium ISP• Write a program fragment that computes the sum of 0 + 1

+ 2 + … + x where x is an integer already stored in register eax [if x = 0 then a result of 0 should be returned in register ebx.]

mov ebx, 0 // initialize ebx to 0again: cmp eax, 0 // integer > 0?

jle done // integer ≤ 0, get outadd ebx, eaxdec eaxjmp again

done ::::::



• Procedure Call/Return via Stack– call subr will lead to the following:

• The return address (address of the instruction that follows this instruction) will be pushed onto the stack

• eip will be changed to subr (this resembles Jump subrin MARIE)

– ret will lead to the following:• The return address is popped from the stack and moved

into eip; hence control returns to the caller that previously has executed call subr

• This can be contrasted with JumpI in MARIE



• Diagram demonstrating procedure call/return

Memory

esp

call subr

Memory

esp return address

executes

eip

Memory

executesret

eip

esp

push eip onto the stack pop eip from the stack



• The stack allows multiple level procedure call and return, including recursive calls (when a procedure calls itself directly or indirectly).

• The stack can also be used as the medium for passing data from the caller to the procedure and vice versa. For example, a caller can push data onto the stack before executing call subr. When the procedure (subr) is entered, the latter can access these data via the stack, assuming there is a clear protocol between the caller and the procedure where and what these data are.

• An example is given next.


Addendum Pentium ISP• Example: The caller pushes two 4-byte data onto the stack before

calling the procedure (sum). The procedure should replace one of these two data with their sum on the stack before returning the caller.

Caller: Procedure sum:push dword[X] sum: push eaxpush dword[Y] mov eax, [esp+12]call sum add [esp+8], eaxpop ebx pop eax

ret

*Note: dword is NASM notation for a 4-byte memory operand.The extra push/pop eax saves/restores the content of register eaxthat belongs to the caller (to avoid contaminating important data that belongs to the caller).



• Snapshots of the Stack in Call/Return:

Upon executing push dword[Y]

esp value of Y

value of X

Upon executing call sum

espvalue of Y

value of X

return address



• More Snapshots of the Stack

Upon executing push eax

esp

value of Y

value of X

return address

eax

Upon executing ret

esp value of X+Y

value of X

isa

Documents

Transcript of isa