isa
description
Transcript of isa
Instruction Set Architectures
Based on slides from Jones & Bartlett
COMP 228, Fall 2009 Instruction Set Architectures 1
Chapter 5 Objectives
• Understand the factors involved in instruction set architecture design.
• Gain familiarity with the diversity of memory addressing modes.
• Understand the concepts of instruction-level pipelining (and parallelism) and its affect upon execution performance.
COMP 228, Fall 2009 Instruction Set Architectures 2
5.1 Introduction
• This chapter builds upon the ideas in Chapter 4.
• We present a detailed look at different instruction formats, operand types, and memory access methods.
• We will see the interrelation between machine organization and instruction formats.
• This leads to a deeper understanding of computer architecture in general.
COMP 228, Fall 2009 Instruction Set Architectures 3
5.2 Instruction Formats
Instruction sets are differentiated by the following:• Number of bits per instruction.• Stack-based or register-based.• Number of explicit operands per instruction.• Operand location.• Types of operations.• Type and size of operands.
COMP 228, Fall 2009 Instruction Set Architectures 4
5.2 Instruction Formats
Instruction set architectures are measured according to:
• Main memory space occupied by a program.
• Instruction complexity.
• Instruction length (in bits).
• Total number of instructions in the instruction set.
COMP 228, Fall 2009 Instruction Set Architectures 5
5.2 Instruction Formats
In designing/using an instruction set, consideration is given to these variations of features:
• Instruction length (#bits per instruction).– Fixed vs variable length.– Variable length allows using shorter encoding of more
frequent instructions.• Number of operands: typically 0,1,2,3.• Number of addressable registers: RISC allows a
larger register file.• Memory organization.
– Byte- or Word-addressed.• Addressing modes.
– Direct, indirect, indexed, or combinations of these.
COMP 228, Fall 2009 Instruction Set Architectures 6
5.2 Instruction Formats
• The next consideration for architecture design concerns how the CPU will store data.
• Three choices:1. A stack architecture: instructions operate on data
stored in a stack; 0-address machine.2. A register architecture: instructions operate on data in
registers/memory: 1-address, 2-address, 3-address machines.In choosing one over the other, the tradeoffs include: simplicity (and cost) of hardware design, execution speed (performance) and ease of use (programmability).
COMP 228, Fall 2009 Instruction Set Architectures 7
5.2 Instruction Formats
• In a stack architecture, instructions and operands are implicitly taken from the stack.– A stack is a storage where data is pushed onto and popped from
the top (of the stack).• In an accumulator architecture (1-address), one of the
operands of a binary operation is implicitly in the accumulator.– The other operand is in memory, creating lots of bus traffic.– It also limits the number of data temporally stored in the
processor.• In a general purpose register (GPR) architecture (2- or
3- address machine), the operands are explicitly specified.– It is faster than accumulator architecture (data needed can be
completely local).
COMP 228, Fall 2009 Instruction Set Architectures 8
Stack Arithmetic
SP (Top of Stack Pointer)
Memory Memory
Memory
(after push x)
SP x
Memory
y
(after push y)
SP
(after add)
SP x + y
:::: ::::
::::x
::::
COMP 228, Fall 2009 Instruction Set Architectures 9
5.2 Instruction Formats
• Most systems today are GPR systems.• There are three types:
– Memory-memory where two or three (2 source and 1 destination) operands may be in memory.
– Register-memory where at least one operand must be in a register.
– Load-store where no operands may be in memory. This is typically found in RISC architecture.
• The number of operands and the number of available registers has a direct affect on instruction length (space cost) and execution time (time cost).
COMP 228, Fall 2009 Instruction Set Architectures 10
5.2 Instruction Formats
• Stack machines use one - and zero-operand instructions.
• LOAD (Push) and STORE (POP) instructions require a single memory address operand.
• Other instructions use operands from the stack implicitly.
• PUSH and POP operations involve only the stack’s top element.
• Binary instructions (e.g., ADD, MULT) use the top two items on the stack, each involving 2 pop and 1 push operations in implementation.
COMP 228, Fall 2009 Instruction Set Architectures 11
5.2 Instruction Formats
• Stack architectures require us to think about arithmetic expressions a little differently.
• We are accustomed to writing expressions using infix notation, such as: Z = X + Y.
• Stack arithmetic corresponds to using the postfixnotation: Z = XY+.– This is also called reverse Polish notation, (somewhat) in
honor of its Polish inventor, Jan Lukasiewicz (1878 -1956).
COMP 228, Fall 2009 Instruction Set Architectures 12
5.2 Instruction Formats
• The principal advantage of postfix notation is that parentheses are not used, and has an encoding structure that mimics the corresponding stack operations.
• For example, the infix expression, Z = (X × Y) + (W × U),
becomes: Z = X Y × W U × +
in postfix notation.
COMP 228, Fall 2009 Instruction Set Architectures 13
5.2 Instruction Formats
• In a stack ISA, the postfix expression, Z = X Y × W U × +
immediately translates into the following code:PUSH XPUSH YMULTPUSH WPUSH UMULTADDPOP Z
Note: MULT involves popping the stack twice and pushing the result back onto the stack
COMP 228, Fall 2009 Instruction Set Architectures 14
5.2 Instruction Formats
• In comparing with a one-address ISA, like MARIE, the infix expression,
Z = X × Y + W × Uthe translated code becomes:
LOAD XMULT YSTORE TEMPLOAD WMULT UADD TEMPSTORE Z
COMP 228, Fall 2009 Instruction Set Architectures 15
5.2 Instruction Formats
• In a two-address ISA, (e.g.,Intel, Motorola), the infix expression,
Z = X × Y + W × Uis translated into:
LOAD R1,XMULT R1,YLOAD R2,WMULT R2,UADD R1,R2STORE Z,R1
Note: One-address ISAs often require one operand to be a register.
COMP 228, Fall 2009 Instruction Set Architectures 16
5.2 Instruction Formats
• With a three-address ISA, (e.g.,mainframes), the infix expression,
Z = X × Y + W × Uis translated into:
MULT R1,X,YMULT R2,W,UADD Z,R1,R2
Would this program execute faster than the corresponding (longer) program that we saw in the stack-based ISA?
COMP 228, Fall 2009 Instruction Set Architectures 17
5.2 Instruction Formats
• Example: A system has 16 registers and 4K of memory.
• We need 4 bits to access one of the registers. We also need 12 bits for a memory address.
• If the system is to have 16-bit instructions, we have two choices for our instructions:
COMP 228, Fall 2009 Instruction Set Architectures 18
Clicker 5.1
• In comparing stack, 1-address, 2-address and 3-address machines, which of these will likely lead to programs with fewer instructions?
• A: Stack• B: 1-address• C: 2-address• D: 3-address
COMP 228, Fall 2009 Instruction Set Architectures 19
Clicker 5.2
• Continuing with the previous comparison, which of them will likely lead to programs involving the most memory operand accesses (read or write)?
• A: Stack• B: 1-address• C: 2-address• D: 3-address
COMP 228, Fall 2009 Instruction Set Architectures 20
Clicker 5.3
• Which of these architectures is easier to use (program)?
• A: stack• B: 1-address• C: 2-address• D: 3-address
COMP 228, Fall 2009 Instruction Set Architectures 21
5.4 Addressing
• Immediate addressing is where the data is part of the instruction: the operand value is already specified in the instruction.
• Direct addressing is where the address of the data is specified in the instruction.
• Register addressing is where the data is located in a register specified in the instruction.
• Indirect addressing is where the address of the address of the data is specified in the instruction.
• Register indirect addressing uses a register to store the address of the address of the data.
COMP 228, Fall 2009 Instruction Set Architectures 22
5.4 Addressing
• Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in the operand to determine the effective address of the data.
• Based addressing is similar except that a base register is used instead of an index register.
• The difference between these two is that an index register holds an offset relative to the address given in the instruction, and a base register holds a base address where the address field specified in the instruction represents a displacement from this base.
COMP 228, Fall 2009 Instruction Set Architectures 23
5.4 Addressing
• For the instruction shown, what is the value loaded into the accumulator for each addressing mode?
COMP 228, Fall 2009 Instruction Set Architectures 24
5.4 Addressing
• These are the values loaded into the accumulator for each addressing mode.
COMP 228, Fall 2009 Instruction Set Architectures 25
5.5 Instruction-Level Pipelining
• Some CPUs divide the fetch-decode-execute cycle into smaller steps.
• These smaller steps for a set of independent instructions can be executed in parallel to increase throughput; this is called ILP (instruction level parallelism).
• Parallel execution can be implemented using the assembly line principle: called pipelining. Pipelining has been the key driving force in improving the performing of processors for over 2 decades.
The next slide shows an example of instruction-level pipelining.
COMP 228, Fall 2009 Instruction Set Architectures 26
Clicker 5.4
Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from Loadx 100? (indexed)
A: 100B: 200C: 300D: None of the above
COMP 228, Fall 2009 Instruction Set Architectures 27
Clicker 5.5
Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from Loadi 100? (immediate)
A: 100B: 200C: 300D: None of the above
COMP 228, Fall 2009 Instruction Set Architectures 28
Clicker 5.6
Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from LoadI 100? (indirect)
A: 100B: 200C: 300D: None of the above
COMP 228, Fall 2009 Instruction Set Architectures 29
5.5 Instruction-Level Pipelining
• Suppose a fetch-decode-execute cycle were broken into the following smaller steps:
• Suppose we have a six-stage pipeline. S1 fetches the instruction, S2 decodes it, S3 determines the address of the operands, S4 fetches them, S5 executes the instruction, and S6 stores the result.
1. Fetch instruction. 4. Fetch operands.2. Decode opcode. 5. Execute instruction.3. Calculate effective 6. Store result.
address of operands.
COMP 228, Fall 2009 Instruction Set Architectures 30
5.5 Instruction-Level Pipelining
• For every clock cycle, one small step is carried out, and the stages are overlapped.
S1. Fetch instruction. S4. Fetch operands.S2. Decode opcode. S5. Execute.S3. Calculate effective S6. Store result.
address of operands.
COMP 228, Fall 2009 Instruction Set Architectures 31
5.5 Instruction-Level Pipelining
• The theoretical speedup offered by a pipeline can be determined as follows:
Let tp be the time per stage. Each instruction represents a task, T, in the pipeline.The first task (instruction) requires k × tp time to complete in a k-stage pipeline. The remaining (n - 1) tasks emerge from the pipeline one per cycle. So the total time to complete the remaining tasks is (n - 1)tp.Thus, to complete n tasks using a k-stage pipeline requires:
(k × tp) + (n - 1)tp = (k + n - 1)tp.
COMP 228, Fall 2009 Instruction Set Architectures 32
5.5 Instruction-Level Pipelining
• If we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find:
• If we take the limit as n approaches infinity, (k + n - 1) approaches n, which results in a theoretical speedup of:
COMP 228, Fall 2009 Instruction Set Architectures 33
5.5 Instruction-Level Pipelining
• Our neat equations take a number of things for granted.
• First, we have to assume that the architecture supports fetching instructions and data in parallel.
• Second, we assume that the pipeline can be kept filled at all times. This is not always the case. Pipeline hazards arise that cause pipeline conflicts and stalls.
COMP 228, Fall 2009 Instruction Set Architectures 34
5.5 Instruction-Level Pipelining
• An instruction pipeline may stall, or be flushed for any of the following reasons:
– Resource conflicts: instructions may need the same component (such as bus/ALU).– Data dependencies: the next (or a future) instruction depends on the result of the current instruction.– Conditional branching (control dependency): a jump instruction depends on the result of the current instruction.
• Measures can be taken at the software level as well as at the hardware level to reduce the effects of these hazards, but they cannot be totally eliminated.
COMP 228, Fall 2009 Instruction Set Architectures 35
Clicker 5.7
Suppose a pipelined processor involves 10 segments. What is the maximum speedup that can be attained due to pipelining?
A: 1B: 5C: 10D: > 10
COMP 228, Fall 2009 Instruction Set Architectures 36
5.6 Real-World Examples of ISAs
• We return briefly to the Intel and MIPS architectures from the last chapter, using some of the ideas introduced in this chapter.
• Intel introduced pipelining to their processor line with its Pentium chip.
• The first Pentium had two five-stage pipelines. Each subsequent Pentium processor had a longer pipeline than its predecessor with the Pentium IV having a 24-stage pipeline.
• The Itanium (IA-64) has only a 10-stage pipeline. [What implications do you deduce from this?]
COMP 228, Fall 2009 Instruction Set Architectures 37
5.6 Real-World Examples of ISAs
• Intel processors support a wide array of addressing modes.
• The original 8086 provided 17 ways to address memory, most of them variants on the methods presented in this chapter.
• Owing to their need for backward compatibility, the Pentium chips also support these 17 addressing modes.
• The Itanium, having a RISC core, supports only one: register indirect addressing with optional post increment.
COMP 228, Fall 2009 Instruction Set Architectures 38
Addendum: Pentium ISP
• Programmable Registersax = ah:al and similarly for the restEIP = PC (in MARIE)
081631EAXEDXECXEBXEBPESIEDIESP
AXDXCXBXBPSIDISP
AH ALDHCHBH
DLCLBL
Eight 32- bit General Registers
EFLAGSEIP
Two Special Purpose Registers
COMP 228, Fall 2009 Instruction Set Architectures 39
Addendum Pentium ISP
• The eight general registers can be used to buffer program variables so that more often used data are locally stored in theprocessor, under the control of the programmer (as coded).
• We can interpret these as 8 accumulators, instead of only 1 in MARIE. This improves programmability and performance (due to locality of program data).
• Technology can support hundred times this amount of registers. The challenge if the programmer/compiler can manage them well in programming.
• The esp register plays a special role: it points to the top of the stack used by the program. Notice a stack exists even though the Pentium is not a stack-computer.
COMP 228, Fall 2009 Instruction Set Architectures 40
Addendum Pentium ISP
• Notable details– Two-operand instructions (CISC)– Memory-mapped I/O; data movement and
arithmetic instructions can affect I/O page in memory
– Stack-based procedure call and return (instead of JNS used in MARIE)
COMP 228, Fall 2009 Instruction Set Architectures 41
Addendum Pentium ISP
• Operand is specified by the following addressing modes, used individually or in combination:
• Base:– {eax, ecx, edx, ebx, ebp, esi, edi, esp}
• Index * scale:– {eax, ecx, edx, ebx, ebp, esi, edi} * {1, 2, 4, 8}– A scale factor of 1/2/4/8 allows traversing data stored
as 1/2/4/8 bytes in consecutive memory locations.• Displacement:
– 0 or 8 or 32 bit displacement directly specified.
COMP 228, Fall 2009 Instruction Set Architectures 42
Addendem Pentium ISP
• Examples:– [ebx+esi+4]: Data is at memory address given by ebx +
esi + 4; this is an example of base-indexed addressing with displacement. For example, ebx can be the starting address of a list of characters, esi is the index to the ith character of the list, and 4 is the 4th character after that.
– ebx: Without the bracket (enclosing the register ebx), the operand is in the register ebx itself. Hence whenever [ ... ] appears, it refers to a memory operand, otherwise it refers to a register operand.
– ebx + esi + 4 is not a valid operand: it is neither in memory nor in a register. The assembler will flag this as syntax error.
COMP 228, Fall 2009 Instruction Set Architectures 43
Addendum Pentium ISP
• Data Movement Instruction– mov ax, bx
• This instruction moves the 2-byte data from register bx to register ax
– mov ax, [ebx]• The source operand is from memory whose address is
specified in register ebx.• This instruction moves the 2-byte data from that address
specified by ebx to register ax.• This is similar to the Load instruction in MARIE, except
that it is more flexible as the destination register can be specified explicitly.
COMP 228, Fall 2009 Instruction Set Architectures 44
Addendum Pentium ISP
– mov [ebx], ax• This does the reverse move (i.e., Store) from register ax
to memory location specified by ebx– mov [ebx], eax
• This differs from the previous only in the size of the operand; this time it moves 4 bytes as the source register eax is 4 bytes.
– mov ah, ‘A’• This moves a 1-byte (immediate) data into register ah;
the 1-byte data is the ASCII code of letter A (i.e., 4116).
COMP 228, Fall 2009 Instruction Set Architectures 45
Clicker 5.8
Suppose M[100] = 200, ebx = 100, eax = 200.What is the result of executing mov eax, [ebx]?
A: eax = 200B: ebx = 200C: eax = 100D: ebx = 100
COMP 228, Fall 2009 Instruction Set Architectures 46
Clicker 5.9
Suppose ebx = 1000 (hex), eax = 20000(hex).What is the result of executing mov ax, bx?
A: eax = 1000 (hex)B: eax = 21000 (hex)C: eax = 30000 (hex)D: None of the above
COMP 228, Fall 2009 Instruction Set Architectures 47
Clicker 5.10
Suppose eax = 12345678(hex).What is the result in executing mov ax, ’30’?
A: eax = 30B: eax = 3330C: eax = 12345630D: eax = 12343330
COMP 228, Fall 2009 Instruction Set Architectures 48
Addendeum Pentium ISP• segment .data ;data segment• msg db ‘Hello World! PleaseType Your ID’,0xA• len equ $ - msg ; length of message • segment .bss ;uninitialized data • id resb 10 ; reserve 10 bytes for ID• segment .text ; code segment• global _start ; global program name • _start: ; program entry• mov eax, 4 ; select kernel call #4 • mov ebx, 1 ; default output device• mov ecx, msg ; pointer to message • mov edx, len ; third argument: length• int 0x80 ; invoke kernel to write• mov eax, 3 ; select kernel call #3 • mov ebx, 0 ; default input device• mov ecx, id ; pointer to id• int 0x80 ; invoke kernel to read• mov eax, 4 ; select kernel call #4 • mov ebx, 1• int 0x80 ; echo id read• exit: mov eax, 1 ; select system call #1• int 0x80 ; invoke kernel call
COMP 228, Fall 2009 Instruction Set Architectures 49
Clicker 5.11
Count the number of distinct immediate operands in the previous program.
A: 4B: 5C: 6D: 7E: >7
COMP 228, Fall 2009 Instruction Set Architectures 50
Addendum Pentium ISP
• Arithmetic/Logic Instructions– add eax, ebx
• add the content of register ebx to the content of register eax.
– Before (in hex): eax = 12345678 ebx = 11111111– After (in hex): eax = 23456789 ebx = 11111111
– sub eax, [ebx]• Subtract the content of memory location specified by
ebx from the content of register eax– Before (in hex): eax = 12345678 [ebx] = 11111111– After (in hex): eax = 01234567
COMP 228, Fall 2009 Instruction Set Architectures 51
Addendum Pentium ISP
• mul bx– dx:ax = ax * bx– This instruction assumes ax as a default
operand.– Multiplying 2-byte with 2-byte creates a 4-byte
result to be stored in the register pair dx:ax• Before (in hex): ax = FFFF bx = 0010• After (in hex): dx = 000F ax = FFF0
COMP 228, Fall 2009 Instruction Set Architectures 52
Clicker 5.12
Suppose eax = 1234, ebx = 1010, ecx = 2000.What is the result of executing sub bx, ax?
A: ebx = FEDCB: ebx = FFFFFEDCC: ebx = 2044D: None of the above
COMP 228, Fall 2009 Instruction Set Architectures 53
Clicker 5.13
Suppose eax = 1010, ebx = 101, edx = 0 (all in hex). What is the result of executing mul bl?
A: eax = 1010B: edx = 102020C: eax = 102020D: None of the above
COMP 228, Fall 2009 Instruction Set Architectures 54
Addendum Pentium ISP
• Control Flow Instructions– jmp X: jump to instruction at address X. – jz X: jump to X if the Z flag = 1 (previous
arithmetic/logic result is zero).– jg X: jump to X if the previous arithmetic/logic result is
greater than zero.– jle X: jump to X if the previous arithmetic/logic result is
less than or equal to zero. – jne X: jump to X if the previous arithmetic/logic result is
not equal to zero.– Among others.
COMP 228, Fall 2009 Instruction Set Architectures 55
Clicker 5.14
Suppose <S,C,O,Z> = <1,0,1,0>.What is the result of executing jz x?
A: the jump will succeed (i.e. eip becomes x)B: the jump will not succeed
COMP 228, Fall 2009 Instruction Set Architectures 56
Addendum Pentium ISP• segment .data• v1 db 1,2,3,4,5,6• v2 db 5,6,7,8,9,0• segment .bss• dotprod resw 1• segment .text• global _asm_main• _asm_main:• mov ecx, 6 ; initialize count to 6• sub esi, esi ; equivalent to mov esi, 0• sub dx, dx ; initialize dx (result) to 0• cont: mov al, [esi+v1] ;based index addressing to vector• mov bl, [esi+v2]• imul bl ; ax = al * bl• add dx, ax ; update the dot product in dx• add esi, 1 ; increment esi by 1• sub ecx, 1 ; decrement ecx by 1• jnz cont ; if ecx ≠ 0, jump to cont• mov [dotprod], dx ; update dot product in memory• exit: mov eax, 1 ; return control to kernel• int 0x80
COMP 228, Fall 2009 Instruction Set Architectures 57
Addendum Pentium ISP• Write a program fragment that computes the sum of 0 + 1
+ 2 + … + x where x is an integer already stored in register eax [if x = 0 then a result of 0 should be returned in register ebx.]
mov ebx, 0 // initialize ebx to 0again: cmp eax, 0 // integer > 0?
jle done // integer ≤ 0, get outadd ebx, eaxdec eaxjmp again
done ::::::
COMP 228, Fall 2009 Instruction Set Architectures 58
Addendum Pentium ISP
• Procedure Call/Return via Stack– call subr will lead to the following:
• The return address (address of the instruction that follows this instruction) will be pushed onto the stack
• eip will be changed to subr (this resembles Jump subrin MARIE)
– ret will lead to the following:• The return address is popped from the stack and moved
into eip; hence control returns to the caller that previously has executed call subr
• This can be contrasted with JumpI in MARIE
COMP 228, Fall 2009 Instruction Set Architectures 59
Addendum Pentium ISP
• Diagram demonstrating procedure call/return
Memory
esp
call subr
Memory
esp return address
executes
eip
Memory
executesret
eip
esp
push eip onto the stack pop eip from the stack
COMP 228, Fall 2009 Instruction Set Architectures 60
Addendum Pentium ISP
• The stack allows multiple level procedure call and return, including recursive calls (when a procedure calls itself directly or indirectly).
• The stack can also be used as the medium for passing data from the caller to the procedure and vice versa. For example, a caller can push data onto the stack before executing call subr. When the procedure (subr) is entered, the latter can access these data via the stack, assuming there is a clear protocol between the caller and the procedure where and what these data are.
• An example is given next.
COMP 228, Fall 2009 Instruction Set Architectures 61
Addendum Pentium ISP• Example: The caller pushes two 4-byte data onto the stack before
calling the procedure (sum). The procedure should replace one of these two data with their sum on the stack before returning the caller.
Caller: Procedure sum:push dword[X] sum: push eaxpush dword[Y] mov eax, [esp+12]call sum add [esp+8], eaxpop ebx pop eax
ret
*Note: dword is NASM notation for a 4-byte memory operand.The extra push/pop eax saves/restores the content of register eaxthat belongs to the caller (to avoid contaminating important data that belongs to the caller).
COMP 228, Fall 2009 Instruction Set Architectures 62
Addendum Pentium ISP
• Snapshots of the Stack in Call/Return:
Upon executing push dword[Y]
esp value of Y
value of X
Upon executing call sum
espvalue of Y
value of X
return address
COMP 228, Fall 2009 Instruction Set Architectures 63
Addendum Pentium ISP
• More Snapshots of the Stack
Upon executing push eax
esp
value of Y
value of X
return address
eax
Upon executing ret
esp value of X+Y
value of X