isa

64
Instruction Set Architectures Based on slides from Jones & Bartlett

description

Instruction Set Architectures Based on slides from Jones & Bartlett • Understand the factors involved in instruction set architecture design. Chapter 5 Objectives Instruction Set Architectures 1 COMP 228, Fall 2009 • We will see the interrelation between machine organization and instruction formats. • This leads to a deeper understanding of computer architecture in general. • This chapter builds upon the ideas in Chapter 4. Instruction Set Architectures 2 COMP 228, Fall 2009

Transcript of isa

Page 1: isa

Instruction Set Architectures

Based on slides from Jones & Bartlett

Page 2: isa

COMP 228, Fall 2009 Instruction Set Architectures 1

Chapter 5 Objectives

• Understand the factors involved in instruction set architecture design.

• Gain familiarity with the diversity of memory addressing modes.

• Understand the concepts of instruction-level pipelining (and parallelism) and its affect upon execution performance.

Page 3: isa

COMP 228, Fall 2009 Instruction Set Architectures 2

5.1 Introduction

• This chapter builds upon the ideas in Chapter 4.

• We present a detailed look at different instruction formats, operand types, and memory access methods.

• We will see the interrelation between machine organization and instruction formats.

• This leads to a deeper understanding of computer architecture in general.

Page 4: isa

COMP 228, Fall 2009 Instruction Set Architectures 3

5.2 Instruction Formats

Instruction sets are differentiated by the following:• Number of bits per instruction.• Stack-based or register-based.• Number of explicit operands per instruction.• Operand location.• Types of operations.• Type and size of operands.

Page 5: isa

COMP 228, Fall 2009 Instruction Set Architectures 4

5.2 Instruction Formats

Instruction set architectures are measured according to:

• Main memory space occupied by a program.

• Instruction complexity.

• Instruction length (in bits).

• Total number of instructions in the instruction set.

Page 6: isa

COMP 228, Fall 2009 Instruction Set Architectures 5

5.2 Instruction Formats

In designing/using an instruction set, consideration is given to these variations of features:

• Instruction length (#bits per instruction).– Fixed vs variable length.– Variable length allows using shorter encoding of more

frequent instructions.• Number of operands: typically 0,1,2,3.• Number of addressable registers: RISC allows a

larger register file.• Memory organization.

– Byte- or Word-addressed.• Addressing modes.

– Direct, indirect, indexed, or combinations of these.

Page 7: isa

COMP 228, Fall 2009 Instruction Set Architectures 6

5.2 Instruction Formats

• The next consideration for architecture design concerns how the CPU will store data.

• Three choices:1. A stack architecture: instructions operate on data

stored in a stack; 0-address machine.2. A register architecture: instructions operate on data in

registers/memory: 1-address, 2-address, 3-address machines.In choosing one over the other, the tradeoffs include: simplicity (and cost) of hardware design, execution speed (performance) and ease of use (programmability).

Page 8: isa

COMP 228, Fall 2009 Instruction Set Architectures 7

5.2 Instruction Formats

• In a stack architecture, instructions and operands are implicitly taken from the stack.– A stack is a storage where data is pushed onto and popped from

the top (of the stack).• In an accumulator architecture (1-address), one of the

operands of a binary operation is implicitly in the accumulator.– The other operand is in memory, creating lots of bus traffic.– It also limits the number of data temporally stored in the

processor.• In a general purpose register (GPR) architecture (2- or

3- address machine), the operands are explicitly specified.– It is faster than accumulator architecture (data needed can be

completely local).

Page 9: isa

COMP 228, Fall 2009 Instruction Set Architectures 8

Stack Arithmetic

SP (Top of Stack Pointer)

Memory Memory

Memory

(after push x)

SP x

Memory

y

(after push y)

SP

(after add)

SP x + y

:::: ::::

::::x

::::

Page 10: isa

COMP 228, Fall 2009 Instruction Set Architectures 9

5.2 Instruction Formats

• Most systems today are GPR systems.• There are three types:

– Memory-memory where two or three (2 source and 1 destination) operands may be in memory.

– Register-memory where at least one operand must be in a register.

– Load-store where no operands may be in memory. This is typically found in RISC architecture.

• The number of operands and the number of available registers has a direct affect on instruction length (space cost) and execution time (time cost).

Page 11: isa

COMP 228, Fall 2009 Instruction Set Architectures 10

5.2 Instruction Formats

• Stack machines use one - and zero-operand instructions.

• LOAD (Push) and STORE (POP) instructions require a single memory address operand.

• Other instructions use operands from the stack implicitly.

• PUSH and POP operations involve only the stack’s top element.

• Binary instructions (e.g., ADD, MULT) use the top two items on the stack, each involving 2 pop and 1 push operations in implementation.

Page 12: isa

COMP 228, Fall 2009 Instruction Set Architectures 11

5.2 Instruction Formats

• Stack architectures require us to think about arithmetic expressions a little differently.

• We are accustomed to writing expressions using infix notation, such as: Z = X + Y.

• Stack arithmetic corresponds to using the postfixnotation: Z = XY+.– This is also called reverse Polish notation, (somewhat) in

honor of its Polish inventor, Jan Lukasiewicz (1878 -1956).

Page 13: isa

COMP 228, Fall 2009 Instruction Set Architectures 12

5.2 Instruction Formats

• The principal advantage of postfix notation is that parentheses are not used, and has an encoding structure that mimics the corresponding stack operations.

• For example, the infix expression, Z = (X × Y) + (W × U),

becomes: Z = X Y × W U × +

in postfix notation.

Page 14: isa

COMP 228, Fall 2009 Instruction Set Architectures 13

5.2 Instruction Formats

• In a stack ISA, the postfix expression, Z = X Y × W U × +

immediately translates into the following code:PUSH XPUSH YMULTPUSH WPUSH UMULTADDPOP Z

Note: MULT involves popping the stack twice and pushing the result back onto the stack

Page 15: isa

COMP 228, Fall 2009 Instruction Set Architectures 14

5.2 Instruction Formats

• In comparing with a one-address ISA, like MARIE, the infix expression,

Z = X × Y + W × Uthe translated code becomes:

LOAD XMULT YSTORE TEMPLOAD WMULT UADD TEMPSTORE Z

Page 16: isa

COMP 228, Fall 2009 Instruction Set Architectures 15

5.2 Instruction Formats

• In a two-address ISA, (e.g.,Intel, Motorola), the infix expression,

Z = X × Y + W × Uis translated into:

LOAD R1,XMULT R1,YLOAD R2,WMULT R2,UADD R1,R2STORE Z,R1

Note: One-address ISAs often require one operand to be a register.

Page 17: isa

COMP 228, Fall 2009 Instruction Set Architectures 16

5.2 Instruction Formats

• With a three-address ISA, (e.g.,mainframes), the infix expression,

Z = X × Y + W × Uis translated into:

MULT R1,X,YMULT R2,W,UADD Z,R1,R2

Would this program execute faster than the corresponding (longer) program that we saw in the stack-based ISA?

Page 18: isa

COMP 228, Fall 2009 Instruction Set Architectures 17

5.2 Instruction Formats

• Example: A system has 16 registers and 4K of memory.

• We need 4 bits to access one of the registers. We also need 12 bits for a memory address.

• If the system is to have 16-bit instructions, we have two choices for our instructions:

Page 19: isa

COMP 228, Fall 2009 Instruction Set Architectures 18

Clicker 5.1

• In comparing stack, 1-address, 2-address and 3-address machines, which of these will likely lead to programs with fewer instructions?

• A: Stack• B: 1-address• C: 2-address• D: 3-address

Page 20: isa

COMP 228, Fall 2009 Instruction Set Architectures 19

Clicker 5.2

• Continuing with the previous comparison, which of them will likely lead to programs involving the most memory operand accesses (read or write)?

• A: Stack• B: 1-address• C: 2-address• D: 3-address

Page 21: isa

COMP 228, Fall 2009 Instruction Set Architectures 20

Clicker 5.3

• Which of these architectures is easier to use (program)?

• A: stack• B: 1-address• C: 2-address• D: 3-address

Page 22: isa

COMP 228, Fall 2009 Instruction Set Architectures 21

5.4 Addressing

• Immediate addressing is where the data is part of the instruction: the operand value is already specified in the instruction.

• Direct addressing is where the address of the data is specified in the instruction.

• Register addressing is where the data is located in a register specified in the instruction.

• Indirect addressing is where the address of the address of the data is specified in the instruction.

• Register indirect addressing uses a register to store the address of the address of the data.

Page 23: isa

COMP 228, Fall 2009 Instruction Set Architectures 22

5.4 Addressing

• Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in the operand to determine the effective address of the data.

• Based addressing is similar except that a base register is used instead of an index register.

• The difference between these two is that an index register holds an offset relative to the address given in the instruction, and a base register holds a base address where the address field specified in the instruction represents a displacement from this base.

Page 24: isa

COMP 228, Fall 2009 Instruction Set Architectures 23

5.4 Addressing

• For the instruction shown, what is the value loaded into the accumulator for each addressing mode?

Page 25: isa

COMP 228, Fall 2009 Instruction Set Architectures 24

5.4 Addressing

• These are the values loaded into the accumulator for each addressing mode.

Page 26: isa

COMP 228, Fall 2009 Instruction Set Architectures 25

5.5 Instruction-Level Pipelining

• Some CPUs divide the fetch-decode-execute cycle into smaller steps.

• These smaller steps for a set of independent instructions can be executed in parallel to increase throughput; this is called ILP (instruction level parallelism).

• Parallel execution can be implemented using the assembly line principle: called pipelining. Pipelining has been the key driving force in improving the performing of processors for over 2 decades.

The next slide shows an example of instruction-level pipelining.

Page 27: isa

COMP 228, Fall 2009 Instruction Set Architectures 26

Clicker 5.4

Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from Loadx 100? (indexed)

A: 100B: 200C: 300D: None of the above

Page 28: isa

COMP 228, Fall 2009 Instruction Set Architectures 27

Clicker 5.5

Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from Loadi 100? (immediate)

A: 100B: 200C: 300D: None of the above

Page 29: isa

COMP 228, Fall 2009 Instruction Set Architectures 28

Clicker 5.6

Suppose M[100] = 200, M[200] = 300, andX (index register) = 100.What is the content of AC from LoadI 100? (indirect)

A: 100B: 200C: 300D: None of the above

Page 30: isa

COMP 228, Fall 2009 Instruction Set Architectures 29

5.5 Instruction-Level Pipelining

• Suppose a fetch-decode-execute cycle were broken into the following smaller steps:

• Suppose we have a six-stage pipeline. S1 fetches the instruction, S2 decodes it, S3 determines the address of the operands, S4 fetches them, S5 executes the instruction, and S6 stores the result.

1. Fetch instruction. 4. Fetch operands.2. Decode opcode. 5. Execute instruction.3. Calculate effective 6. Store result.

address of operands.

Page 31: isa

COMP 228, Fall 2009 Instruction Set Architectures 30

5.5 Instruction-Level Pipelining

• For every clock cycle, one small step is carried out, and the stages are overlapped.

S1. Fetch instruction. S4. Fetch operands.S2. Decode opcode. S5. Execute.S3. Calculate effective S6. Store result.

address of operands.

Page 32: isa

COMP 228, Fall 2009 Instruction Set Architectures 31

5.5 Instruction-Level Pipelining

• The theoretical speedup offered by a pipeline can be determined as follows:

Let tp be the time per stage. Each instruction represents a task, T, in the pipeline.The first task (instruction) requires k × tp time to complete in a k-stage pipeline. The remaining (n - 1) tasks emerge from the pipeline one per cycle. So the total time to complete the remaining tasks is (n - 1)tp.Thus, to complete n tasks using a k-stage pipeline requires:

(k × tp) + (n - 1)tp = (k + n - 1)tp.

Page 33: isa

COMP 228, Fall 2009 Instruction Set Architectures 32

5.5 Instruction-Level Pipelining

• If we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find:

• If we take the limit as n approaches infinity, (k + n - 1) approaches n, which results in a theoretical speedup of:

Page 34: isa

COMP 228, Fall 2009 Instruction Set Architectures 33

5.5 Instruction-Level Pipelining

• Our neat equations take a number of things for granted.

• First, we have to assume that the architecture supports fetching instructions and data in parallel.

• Second, we assume that the pipeline can be kept filled at all times. This is not always the case. Pipeline hazards arise that cause pipeline conflicts and stalls.

Page 35: isa

COMP 228, Fall 2009 Instruction Set Architectures 34

5.5 Instruction-Level Pipelining

• An instruction pipeline may stall, or be flushed for any of the following reasons:

– Resource conflicts: instructions may need the same component (such as bus/ALU).– Data dependencies: the next (or a future) instruction depends on the result of the current instruction.– Conditional branching (control dependency): a jump instruction depends on the result of the current instruction.

• Measures can be taken at the software level as well as at the hardware level to reduce the effects of these hazards, but they cannot be totally eliminated.

Page 36: isa

COMP 228, Fall 2009 Instruction Set Architectures 35

Clicker 5.7

Suppose a pipelined processor involves 10 segments. What is the maximum speedup that can be attained due to pipelining?

A: 1B: 5C: 10D: > 10

Page 37: isa

COMP 228, Fall 2009 Instruction Set Architectures 36

5.6 Real-World Examples of ISAs

• We return briefly to the Intel and MIPS architectures from the last chapter, using some of the ideas introduced in this chapter.

• Intel introduced pipelining to their processor line with its Pentium chip.

• The first Pentium had two five-stage pipelines. Each subsequent Pentium processor had a longer pipeline than its predecessor with the Pentium IV having a 24-stage pipeline.

• The Itanium (IA-64) has only a 10-stage pipeline. [What implications do you deduce from this?]

Page 38: isa

COMP 228, Fall 2009 Instruction Set Architectures 37

5.6 Real-World Examples of ISAs

• Intel processors support a wide array of addressing modes.

• The original 8086 provided 17 ways to address memory, most of them variants on the methods presented in this chapter.

• Owing to their need for backward compatibility, the Pentium chips also support these 17 addressing modes.

• The Itanium, having a RISC core, supports only one: register indirect addressing with optional post increment.

Page 39: isa

COMP 228, Fall 2009 Instruction Set Architectures 38

Addendum: Pentium ISP

• Programmable Registersax = ah:al and similarly for the restEIP = PC (in MARIE)

081631EAXEDXECXEBXEBPESIEDIESP

AXDXCXBXBPSIDISP

AH ALDHCHBH

DLCLBL

Eight 32- bit General Registers

EFLAGSEIP

Two Special Purpose Registers

Page 40: isa

COMP 228, Fall 2009 Instruction Set Architectures 39

Addendum Pentium ISP

• The eight general registers can be used to buffer program variables so that more often used data are locally stored in theprocessor, under the control of the programmer (as coded).

• We can interpret these as 8 accumulators, instead of only 1 in MARIE. This improves programmability and performance (due to locality of program data).

• Technology can support hundred times this amount of registers. The challenge if the programmer/compiler can manage them well in programming.

• The esp register plays a special role: it points to the top of the stack used by the program. Notice a stack exists even though the Pentium is not a stack-computer.

Page 41: isa

COMP 228, Fall 2009 Instruction Set Architectures 40

Addendum Pentium ISP

• Notable details– Two-operand instructions (CISC)– Memory-mapped I/O; data movement and

arithmetic instructions can affect I/O page in memory

– Stack-based procedure call and return (instead of JNS used in MARIE)

Page 42: isa

COMP 228, Fall 2009 Instruction Set Architectures 41

Addendum Pentium ISP

• Operand is specified by the following addressing modes, used individually or in combination:

• Base:– {eax, ecx, edx, ebx, ebp, esi, edi, esp}

• Index * scale:– {eax, ecx, edx, ebx, ebp, esi, edi} * {1, 2, 4, 8}– A scale factor of 1/2/4/8 allows traversing data stored

as 1/2/4/8 bytes in consecutive memory locations.• Displacement:

– 0 or 8 or 32 bit displacement directly specified.

Page 43: isa

COMP 228, Fall 2009 Instruction Set Architectures 42

Addendem Pentium ISP

• Examples:– [ebx+esi+4]: Data is at memory address given by ebx +

esi + 4; this is an example of base-indexed addressing with displacement. For example, ebx can be the starting address of a list of characters, esi is the index to the ith character of the list, and 4 is the 4th character after that.

– ebx: Without the bracket (enclosing the register ebx), the operand is in the register ebx itself. Hence whenever [ ... ] appears, it refers to a memory operand, otherwise it refers to a register operand.

– ebx + esi + 4 is not a valid operand: it is neither in memory nor in a register. The assembler will flag this as syntax error.

Page 44: isa

COMP 228, Fall 2009 Instruction Set Architectures 43

Addendum Pentium ISP

• Data Movement Instruction– mov ax, bx

• This instruction moves the 2-byte data from register bx to register ax

– mov ax, [ebx]• The source operand is from memory whose address is

specified in register ebx.• This instruction moves the 2-byte data from that address

specified by ebx to register ax.• This is similar to the Load instruction in MARIE, except

that it is more flexible as the destination register can be specified explicitly.

Page 45: isa

COMP 228, Fall 2009 Instruction Set Architectures 44

Addendum Pentium ISP

– mov [ebx], ax• This does the reverse move (i.e., Store) from register ax

to memory location specified by ebx– mov [ebx], eax

• This differs from the previous only in the size of the operand; this time it moves 4 bytes as the source register eax is 4 bytes.

– mov ah, ‘A’• This moves a 1-byte (immediate) data into register ah;

the 1-byte data is the ASCII code of letter A (i.e., 4116).

Page 46: isa

COMP 228, Fall 2009 Instruction Set Architectures 45

Clicker 5.8

Suppose M[100] = 200, ebx = 100, eax = 200.What is the result of executing mov eax, [ebx]?

A: eax = 200B: ebx = 200C: eax = 100D: ebx = 100

Page 47: isa

COMP 228, Fall 2009 Instruction Set Architectures 46

Clicker 5.9

Suppose ebx = 1000 (hex), eax = 20000(hex).What is the result of executing mov ax, bx?

A: eax = 1000 (hex)B: eax = 21000 (hex)C: eax = 30000 (hex)D: None of the above

Page 48: isa

COMP 228, Fall 2009 Instruction Set Architectures 47

Clicker 5.10

Suppose eax = 12345678(hex).What is the result in executing mov ax, ’30’?

A: eax = 30B: eax = 3330C: eax = 12345630D: eax = 12343330

Page 49: isa

COMP 228, Fall 2009 Instruction Set Architectures 48

Addendeum Pentium ISP• segment .data ;data segment• msg db ‘Hello World! PleaseType Your ID’,0xA• len equ $ - msg ; length of message • segment .bss ;uninitialized data • id resb 10 ; reserve 10 bytes for ID• segment .text ; code segment• global _start ; global program name • _start: ; program entry• mov eax, 4 ; select kernel call #4 • mov ebx, 1 ; default output device• mov ecx, msg ; pointer to message • mov edx, len ; third argument: length• int 0x80 ; invoke kernel to write• mov eax, 3 ; select kernel call #3 • mov ebx, 0 ; default input device• mov ecx, id ; pointer to id• int 0x80 ; invoke kernel to read• mov eax, 4 ; select kernel call #4 • mov ebx, 1• int 0x80 ; echo id read• exit: mov eax, 1 ; select system call #1• int 0x80 ; invoke kernel call

Page 50: isa

COMP 228, Fall 2009 Instruction Set Architectures 49

Clicker 5.11

Count the number of distinct immediate operands in the previous program.

A: 4B: 5C: 6D: 7E: >7

Page 51: isa

COMP 228, Fall 2009 Instruction Set Architectures 50

Addendum Pentium ISP

• Arithmetic/Logic Instructions– add eax, ebx

• add the content of register ebx to the content of register eax.

– Before (in hex): eax = 12345678 ebx = 11111111– After (in hex): eax = 23456789 ebx = 11111111

– sub eax, [ebx]• Subtract the content of memory location specified by

ebx from the content of register eax– Before (in hex): eax = 12345678 [ebx] = 11111111– After (in hex): eax = 01234567

Page 52: isa

COMP 228, Fall 2009 Instruction Set Architectures 51

Addendum Pentium ISP

• mul bx– dx:ax = ax * bx– This instruction assumes ax as a default

operand.– Multiplying 2-byte with 2-byte creates a 4-byte

result to be stored in the register pair dx:ax• Before (in hex): ax = FFFF bx = 0010• After (in hex): dx = 000F ax = FFF0

Page 53: isa

COMP 228, Fall 2009 Instruction Set Architectures 52

Clicker 5.12

Suppose eax = 1234, ebx = 1010, ecx = 2000.What is the result of executing sub bx, ax?

A: ebx = FEDCB: ebx = FFFFFEDCC: ebx = 2044D: None of the above

Page 54: isa

COMP 228, Fall 2009 Instruction Set Architectures 53

Clicker 5.13

Suppose eax = 1010, ebx = 101, edx = 0 (all in hex). What is the result of executing mul bl?

A: eax = 1010B: edx = 102020C: eax = 102020D: None of the above

Page 55: isa

COMP 228, Fall 2009 Instruction Set Architectures 54

Addendum Pentium ISP

• Control Flow Instructions– jmp X: jump to instruction at address X. – jz X: jump to X if the Z flag = 1 (previous

arithmetic/logic result is zero).– jg X: jump to X if the previous arithmetic/logic result is

greater than zero.– jle X: jump to X if the previous arithmetic/logic result is

less than or equal to zero. – jne X: jump to X if the previous arithmetic/logic result is

not equal to zero.– Among others.

Page 56: isa

COMP 228, Fall 2009 Instruction Set Architectures 55

Clicker 5.14

Suppose <S,C,O,Z> = <1,0,1,0>.What is the result of executing jz x?

A: the jump will succeed (i.e. eip becomes x)B: the jump will not succeed

Page 57: isa

COMP 228, Fall 2009 Instruction Set Architectures 56

Addendum Pentium ISP• segment .data• v1 db 1,2,3,4,5,6• v2 db 5,6,7,8,9,0• segment .bss• dotprod resw 1• segment .text• global _asm_main• _asm_main:• mov ecx, 6 ; initialize count to 6• sub esi, esi ; equivalent to mov esi, 0• sub dx, dx ; initialize dx (result) to 0• cont: mov al, [esi+v1] ;based index addressing to vector• mov bl, [esi+v2]• imul bl ; ax = al * bl• add dx, ax ; update the dot product in dx• add esi, 1 ; increment esi by 1• sub ecx, 1 ; decrement ecx by 1• jnz cont ; if ecx ≠ 0, jump to cont• mov [dotprod], dx ; update dot product in memory• exit: mov eax, 1 ; return control to kernel• int 0x80

Page 58: isa

COMP 228, Fall 2009 Instruction Set Architectures 57

Addendum Pentium ISP• Write a program fragment that computes the sum of 0 + 1

+ 2 + … + x where x is an integer already stored in register eax [if x = 0 then a result of 0 should be returned in register ebx.]

mov ebx, 0 // initialize ebx to 0again: cmp eax, 0 // integer > 0?

jle done // integer ≤ 0, get outadd ebx, eaxdec eaxjmp again

done ::::::

Page 59: isa

COMP 228, Fall 2009 Instruction Set Architectures 58

Addendum Pentium ISP

• Procedure Call/Return via Stack– call subr will lead to the following:

• The return address (address of the instruction that follows this instruction) will be pushed onto the stack

• eip will be changed to subr (this resembles Jump subrin MARIE)

– ret will lead to the following:• The return address is popped from the stack and moved

into eip; hence control returns to the caller that previously has executed call subr

• This can be contrasted with JumpI in MARIE

Page 60: isa

COMP 228, Fall 2009 Instruction Set Architectures 59

Addendum Pentium ISP

• Diagram demonstrating procedure call/return

Memory

esp

call subr

Memory

esp return address

executes

eip

Memory

executesret

eip

esp

push eip onto the stack pop eip from the stack

Page 61: isa

COMP 228, Fall 2009 Instruction Set Architectures 60

Addendum Pentium ISP

• The stack allows multiple level procedure call and return, including recursive calls (when a procedure calls itself directly or indirectly).

• The stack can also be used as the medium for passing data from the caller to the procedure and vice versa. For example, a caller can push data onto the stack before executing call subr. When the procedure (subr) is entered, the latter can access these data via the stack, assuming there is a clear protocol between the caller and the procedure where and what these data are.

• An example is given next.

Page 62: isa

COMP 228, Fall 2009 Instruction Set Architectures 61

Addendum Pentium ISP• Example: The caller pushes two 4-byte data onto the stack before

calling the procedure (sum). The procedure should replace one of these two data with their sum on the stack before returning the caller.

Caller: Procedure sum:push dword[X] sum: push eaxpush dword[Y] mov eax, [esp+12]call sum add [esp+8], eaxpop ebx pop eax

ret

*Note: dword is NASM notation for a 4-byte memory operand.The extra push/pop eax saves/restores the content of register eaxthat belongs to the caller (to avoid contaminating important data that belongs to the caller).

Page 63: isa

COMP 228, Fall 2009 Instruction Set Architectures 62

Addendum Pentium ISP

• Snapshots of the Stack in Call/Return:

Upon executing push dword[Y]

esp value of Y

value of X

Upon executing call sum

espvalue of Y

value of X

return address

Page 64: isa

COMP 228, Fall 2009 Instruction Set Architectures 63

Addendum Pentium ISP

• More Snapshots of the Stack

Upon executing push eax

esp

value of Y

value of X

return address

eax

Upon executing ret

esp value of X+Y

value of X