UNIT II ASSEMBLERS - Notes · PDF fileIntroduction to Assemblers ... SIC Instruction Format...

42
1 UNIT II ASSEMBLERS www.noyesengine.com www.technoscriptz.com

Transcript of UNIT II ASSEMBLERS - Notes · PDF fileIntroduction to Assemblers ... SIC Instruction Format...

Page 1: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

1

UNIT II ASSEMBLERS

www.noyesengine.com

www.technoscriptz.com

Page 2: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

2

1. BASIC ASSEMBLER FUNCTIONS

2. A SIMPLE SIC ASSEMBLER

3. ASSEMBLER ALGORITHM AND DATA STRUCTURES

4. MACHINE DEPENDENT ASSEMBLER FEATURES

5. INSTRUCTION FORMATS AND ADDRESSING MODES

6. PROGRAM RELOCATION

7. MACHINE INDEPENDENT ASSEMBLER FEATURES

8. LITERALS

9. SYMBOL-DEFINING STATEMENTS

Page 3: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

3

10. EXPRESSIONS

11. ONE PASS ASSEMBLERS

12. MULTI PASS ASSEMBLERS

13. IMPLEMENTATION EXAMPLE

14. MASM ASSEMBLER

Assemblers

Basic Assembler Functions

Machine-dependent Assembler Features

Machine-independent Assembler Features

Assembler Design Options

Role of Assembler

Page 4: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

4

Introduction to Assemblers

Fundamental functions translating mnemonic operation codes to their machine language equivalents

assigning machine addresses to symbolic labels

Machine dependency different machine instruction formats and codes

Source

Program

Assembler Object

Code

Loader

Executable

Code

Linker

Page 5: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

5

Example Program

Object Code

Page 6: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

6

Page 7: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

7

Purpose reads records from input device (code F1)

copies them to output device (code 05)

at the end of the file, writes EOF on the output device, then RSUB to the operating

system

program

Data transfer (RD, WD) a buffer is used to store record

buffering is necessary for different I/O rates

the end of each record is marked with a null character (0016)

the end of the file is indicated by a zero-length record

Subroutines (JSUB, RSUB) RDREC, WRREC

save link register first before nested jump

Assembler Directives

Pseudo-Instructions Not translated into machine instructions

Page 8: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

8

Providing information to the assembler

Basic assembler directives START

END

BYTE

WORD

RESB

RESW

Functions of a Basic Assembler Mnemonic code (or instruction name) opcode.

Symbolic operands (e.g., variable names) addresses.

Choose the proper instruction format and addressing mode.

Constants Numbers.

Output to object files and listing files.

SIC Instruction Set Load/Store: LDA/STA, LDX/STX…etc.

Arithmetic: ADD, SUB, MUL, DIV

Compare: COMP

Jump: J

Conditional Jump: JLT, JEQ, JGT

See Appendix A for the complete list.

SIC Instruction Format

Opcode: 8 bits Address: one bit flag (x) and 15 bits of address

Examples

Mnemonic code (or instruction name) Opcode. Variable names, Labels, Subroutines, Constants Address

Examples:

OPCODE X Address

8 1 15

Page 9: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

9

STL RETADR 14 10 33

STCH BUFFER,X 54 90 39

Converting Symbols to Numbers

Isn’t it straightforward? Isn’t it simply the sequential processing of the source program, one line at

a time? Not so, if we have forward references.

Two Pass Assembler

Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directives

Pass 2 Assemble instructions Generate data values defined by BYTE, WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

Read from input line LABEL, OPCODE, OPERAND

COPY START 1000

LDA LEN

LEN RESW 1

0001 0100 0 001 0000 0011 0011

0101 0100 1 001 0000 0011 1001

Page 10: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

10

Two Pass Assembler – Pass 1

• Assign addresses to all statements in the program

• Save the values (addresses) assigned to all labels for use in Pass 2

• Perform some processing of assembler directives. (This includes processing that affects address

assignment, such as determining the length of data areas defined by BYTE, RESW, etc.

PASS-I Algorithm

begin

read first input line

if OPCODE = 'START' then

begin

save #[OPERAND] as starting address

initialized LOCCTR to starting address

write line to intermediate file

read next input line

end {if START}

else

initialized LOCCTR to 0

while OPCODE != 'END' do

begin

if this is not a comment line then

begin

if there is a symbol in the LABEL field then

begin

search SYMTAB for LABEL

if found then

set error flag (duplicate symbol)

else

insert (LABEL, LOCCTR) into SYMTAB

Pass 1 Pass 2

Intermediat

e

file

Object

codes

Source

progra

m

OPTAB SYMTAB SYMTAB

Page 11: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

11

end {if symbol}

search OPTAB for OPCODE

if found then

add 3 {instruction lengh} to LOCCTR

else if OPCODE = 'WORD' then

add 3 to LOCCTR

else if OPCODE = 'RESW' then

add 3 * #[OPERAND] to LOCCTR

else if OPCODE = 'RESB' then

add #[OPERAND] to LOCCTR

else if OPCODE = 'BYTE' then

begin

find length of constant in bytes

add length to LOCCTR

end {if BYTE}

else

set error flag (invalid operation code)

end {if not a comment}

write line to intermediate file

read next input line

end {while not END}

write last line to intermediate file

save (LOCCTR - starting address) as program length

end

Two Pass Assembler – Pass 2

• Assemble instructions (translating operation codes and looking up addresses).

• Generate data values defined by BYTE, WORD, etc.

• Perform processing of assembler directives not done during Pass 1

• Write the object program and the assembly listing

PASS-2 Algorithm

begin

read first input file {from intermediate file}

if OPCODE = 'START' then

begin

write listing line

read next input line

end {if START}

write header record to object program

initialized first Text record

while OPCODE != 'END' do

begin

if this is not a comment line then

Page 12: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

12

begin

search OPTAB for OPCODE

if found then

begin

if there is a symbol in OPERAND field then

begin

search SYMTAB for OPERAND

if found then

store symbol value as operand address

else

begin

store 0 as operand address

set error flag (undefined symbol)

end

end {if symbol}

else

store 0 as operand address

assemble the object code instruction

end {if opcode found}

else if OPCODE = 'BYTE' or 'WORD' then

convert constant to object code

if object code not fit into the current Text record then

begin

write Text record to object program

initialized new Text record

end

add object code to Text record

end {if not comment}

write listing line

read next input line

end {while not END}

write last Text record to object program

write End record to object program

write last listing line

end

Data Structures

Operation Code Table (OPTAB) Symbol Table (SYMTAB) Location Counter(LOCCTR)

OPTAB (operation code table)

• Operation Code Table (OPTAB)

– Used to look up mnemonic operation codes and translate them into machine language

equivalents

– Contains the mnemonic operation code and its machine language equivalent

– In more complex assemblers, contains information like instruction format and length

Page 13: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

13

SYMTAB (symbol table)

Used to store values (addresses) assigned to labels

Includes the name and value for each label

Flags to indicate error conditions, e.g. duplicate definition of labels

May contain other info like type or length about the data area or instruction labeled

LOCCTR(Location counter) – Used to help in the assignment of addresses – Initialized to the beginning address specified in the START

statement – After each source statement is processed, the length of the

assembled instruction or data area to be generated is added – Gives the address of a label

COPY 1000

FIRST 1000

CLOOP 1003

ENDFIL 1015

EOF 1024

THREE 102D

ZERO 1030

RETADR 1033

LENGTH 1036

BUFFER 1039

RDREC 2039

Page 14: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

14

Object Program

Header

Col. 1 H

Col. 2~7 Program name

Col. 8~13 Starting address (hex)

Col. 14-19 Length of object program in bytes (hex)

Text

Col.1 T

Col.2~7Starting address in this record (hex)

Col. 8~9 Length of object code in this record in bytes (hex)

Col. 10~69 Object code (69-10+1)/6=10 instructions

End

Col.1 E

Col.2~7Address of first executable instruction (hex)

(END program_name)

H COPY 001000 00107A

T 001000 1E 141033 482039 001036 281030 301015 482061 ...

T 00101E 15 0C1036 482061 081044 4C0000 454F46 000003 000000

T 002039 1E 041030 001030 E0205D 30203F D8205D 281030 …

T 002057 1C 101036 4C0000 F1 001000 041030 E02079 302064 …

T 002073 07 382064 4C0000 05

E 001000

Assembler Design

Machine Dependent Assembler Features instruction formats and addressing modes

program relocation

Machine Independent Assembler Features literals

symbol-defining statements

expressions

program blocks

control sections and program linking

Page 15: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

15

Machine-dependent Assembler Features

IInnssttrruuccttiioonn ffoorrmmaattss aanndd aaddddrreessssiinngg mmooddeess

PPrrooggrraamm rreellooccaattiioonn

For Eg STL RETADR 17 20 2D

The mode bit p=1, meaning PC relative addressing mode.

Instruction Format and Addressing Mode SIC/XE

PC-relative or Base-relative addressing: op m

Indirect addressing: op @m

Immediate addressing: op #c

Extended format: +op m

Index addressing: op m,x

register-to-register instructions

larger memory -> multi-programming (program allocation)

Translation

Register translation register name (A, X, L, B, S, T, F, PC, SW) and their values (0,1, 2, 3, 4, 5, 6, 8, 9)

preloaded in SYMTAB

Address translation Most register-memory instructions use program counter relative or base relative addressing

Format 3: 12-bit address field base-relative: 0~4095

pc-relative: -2048~2047

Format 4: 20-bit address field

OPCODE e Address

6

bits

12 bits

n i x b p

0001 01 0 0000 0010 1101 1 1 0 0 1

1

7

2D 2

0

Page 16: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

16

PC-Relative Addressing Modes

PC-relative 10 0000 FIRST STL RETADR 17202D

displacement= RETADR - PC = 30-3 = 2D

40 0017 J CLOOP 3F2FEC

displacement= CLOOP-PC= 6 - 1A= -14= FEC

Base-Relative Addressing Modes

Base-relative base register is under the control of the programmer

12 LDB #LENGTH

13 BASE LENGTH

160 104E STCH BUFFER, X 57C003

OPCODE e Address n i x b p

0001 01 0 (02D)16 1 1 0 0 1

OPCODE e Address n i x b p

0011 11 0 (FEC)16 1 1 0 0 1

Page 17: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

17

displacement= BUFFER - B = 0036 - 0033 = 3

NOBASE is used to inform the assembler that the contents of the base register no longer be relied

upon for addressing

Immediate Address Translation

Immediate addressing 55 0020 LDA #3 010003

133 103C +LDT #4096 75101000

Immediate Address Translation (Cont.)

Immediate addressing 12 0003 LDB #LENGTH 69202D

OPCODE e Address n i x b p

0101 01 0 (003)16 1 1 1 1 0

OPCODE e Address n i x b p

0000 00 0 (003)16 0 1 0 0 0

OPCODE e Address n i x b p

0111 01 1 (01000)16 0 1 0 0 0

Page 18: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

18

the immediate operand is the symbol LENGTH

the address of this symbol LENGTH is loaded into register B

LENGTH=0033=PC+displacement=0006+02D

if immediate mode is specified, the target address becomes the operand Indirect Address Translation

Indirect addressing target addressing is computed as usual (PC-relative or BASE-relative)

only the n bit is set to 1 70 002A J @RETADR 3E2003

TA=RETADR=0030

TA=(PC)+disp=002D+0003

Program Relocation

OPCODE e Address n i x b p

0110 10 0 (02D)16 0 1 0 0 1

OPCODE e Address n i x b p

0011 11 0 (003)16 1 0 0 0 1

Page 19: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

19

Example Absolute program, starting address 1000

e.g. 55 101B LDA THREE 00102D

Relocate the program to 2000

e.g. 55 101B LDA THREE 00202D

Each Absolute address should be modified

Example Except for absolute address, the rest of the instructions need not be modified

not a memory address (immediate addressing)

PC-relative, Base-relative

The only parts of the program that require modification at load time are those that specify direct

addresses

Relocatable Program

Modification record Col 1 M

Col 2-7 Starting location of the address field to be

Page 20: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

20

modified, relative to the beginning of the program

Col 8-9 length of the address field to be modified, in half-

bytes

Object Code with M-Records

Modification records are added to the object files. (See pp.64-65 and Figure 2.8.)

Example:

HCOPY 001000 001077

T000000 1D 17202D…4B101036…

T00001D ……

M000007 05 Modification Record

……

E000000

Object Code

Machine-Independent Assembler Features LLiitteerraallss

SSyymmbbooll DDeeffiinniinngg SSttaatteemmeenntt

EExxpprreessssiioonnss

PPrrooggrraamm BBlloocckkss

CCoonnttrrooll SSeeccttiioonnss aanndd PPrrooggrraamm LLiinnkkiinngg

Literals

Design idea Let programmers to be able to write the value of a constant operand as a part of the

instruction that uses it.

This avoids having to define the constant elsewhere in the program and make up a

label for it.

Example e.g. 45 001A ENDFIL LDA =C’EOF’ 032010

93 LTORG

002D * =C’EOF’ 454F46 e.g. 215 1062 WLOOP TD =X’05’ E32011

Page 21: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

21

Literals vs. Immediate Operands

Immediate Operands The operand value is assembled as part of the machine instruction e.g. 55 0020 LDA #3 010003

Literals The assembler generates the specified value as a constant at some other memory

location e.g. 45 001A ENDFIL LDA =C’EOF’ 032010

Compare (Fig. 2.6) e.g. 45 001A ENDFIL LDA EOF 032010

80 002D EOF BYTE C’EOF’ 454F46

Literal - Implementation

Literal pools Normally literals are placed into a pool at the end of the program

see (END statement) In some cases, it is desirable to place literals into a pool at some other location in the

object program

assembler directive LTORG

reason: keep the literal operand close to the instruction

Duplicate literals e.g. 215 1062 WLOOP TD =X’05’

e.g. 230 106B WD =X’05’

The assemblers should recognize duplicate literals and store only one copy of the

specified data value

Comparison of the defining expression • Same literal name with different value, e.g. LOCCTR=*

Comparison of the generated data value • The benefits of using generate data value are usually not great enough to justify

the additional complexity in the assembler

LITTAB literal name, the operand value and length, the address assigned to the operand

Pass 1 build LITTAB with literal name, operand value and length, leaving the address unassigned

when LTORG statement is encountered, assign an address to each literal not yet assigned an

address

Pass 2

C'EOF' 454F46 3 002D

X'05' 05 1 1076

Page 22: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

22

search LITTAB for each literal operand encountered

generate data values using BYTE or WORD statements

generate modification record for literals that represent an address in the program

Symbol-Defining Statements

Labels on instructions or data areas the value of such a label is the address assigned to the statement

Defining symbols symbol EQU value

value can be: constant, other symbol, expression

making the source program easier to understand

no forward reference

Example 1 MAXLEN EQU 4096

+LDT #MAXLEN

Example 2 (Many general purpose registers) BASE EQU R1

COUNT EQU R2

INDEX EQU R3

Example 3 MAXLEN EQU BUFEND-BUFFER

Counter Example

BETA EQU ALPHA

ALPHA RESW 1

BETA cannot be assigned a value when it is encountered during Pass 1 of the assembly

(because ALPHA does not yet have a value).

ORG (origin)

Indirectly assign values to symbols

Reset the location counter to the specified value ORG value

Value can be: constant, other symbol, expression

No forward reference

Example

SYMBOL: 6bytes

VALUE: 1word

FLAGS: 2bytes LDA VALUE, X

+LDT #4096

Page 23: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

23

ORG Example

Using EQU statements STAB RESB 1100

SYMBOL EQU STAB

VALUE EQU STAB+6

FLAG EQU STAB+9

Using ORG statements STAB RESB 1100

ORG STAB

SYMBOL RESB 6

VALUE RESW 1

FLAGS RESB 2

ORG STAB+1100

Expressions

Expressions can be classified as absolute expressions or relative expressions MAXLEN EQU BUFEND-BUFFER

BUFEND and BUFFER both are relative terms, representing addresses within the program

However the expression BUFEND-BUFFER represents an absolute value

When relative terms are paired with opposite signs, the dependency on the program starting

address is canceled out; the result is an absolute value

SYMTAB

None of the relative terms may enter into a multiplication or division operation

Errors: BUFEND+BUFFER

100-BUFFER

3*BUFFER

The type of an expression

keep track of the types of all symbols defined in the program

Program Blocks

Program blocks refer to segments of code that are rearranged within a single object program unit

USE [blockname]

Default block

Each program block may actually contain several separate segments of the source

program

SYMBOL VALUE FLAGS

STAB

(100 entries)

. . .

. . .

. . .

Symbol Type Value

RETADR R 30

BUFFER R 36

BUFEND R 1036

MAXLEN A 1000

Page 24: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

24

Implementation

Pass 1 each program block has a separate location counter

each label is assigned an address that is relative to the start of the block that contains it

at the end of Pass 1, the latest value of the location counter for each block indicates the length of

that block

the assembler can then assign to each block a starting address in the object program

Pass 2 The address of each symbol can be computed by adding the assigned block starting address and the

relative address of the symbol to that block

Each source line is given a relative address assigned and a block number

For absolute symbol, there is no block number line 107

Example 20 0006 0 LDA LENGTH 032060

LENGTH=(Block 1)+0003= 0066+0003= 0069

LOCCTR=(Block 0)+0009= 0009

Block name Block number Address Length

(default) 0 0000 0066

CDATA 1 0066 000B

CBLKS 2 0071 1000

Page 25: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

25

Program Readability

Program readability No extended format instructions on lines 15, 35, 65

No needs for base relative addressing (line 13, 14)

LTORG is used to make sure the literals are placed ahead of any large data areas (line 253)

Object code It is not necessary to physically rearrange the generated code in the object program

Control Sections and Program Linking

Control Sections are most often used for subroutines or other logical subdivisions of a program

the programmer can assemble, load, and manipulate each of these control sections

separately

instruction in one control section may need to refer to instructions or data located in

another section

because of this, there should be some means for linking control sections together

Page 26: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

26

External Definition and References

External definition EXTDEF name [, name]

EXTDEF names symbols that are defined in this control section and may be used by

other sections

External reference EXTREF name [,name]

EXTREF names symbols that are used in this control section and are defined

elsewhere

Example 15 0003 CLOOP +JSUB RDREC 4B100000

160 0017 +STCH BUFFER,X 57900000

190 0028 MAXLEN WORD BUFEND-BUFFER 000000

Implementation

The assembler must include information in the object program that will cause the loader to insert

proper values where they are required Define record

Col. 1 D

Col. 2-7 Name of external symbol defined in this control section

Col. 8-13 Relative address within this control section (hexadeccimal)

Col.14-73 Repeat information in Col. 2-13 for other external symbols

Refer record Col. 1 D

Col. 2-7 Name of external symbol referred to in this control section

Page 27: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

27

Col. 8-73 Name of other external reference symbols

Modification Record

Modification record Col. 1 M

Col. 2-7 Starting address of the field to be modified (hexiadecimal)

Col. 8-9 Length of the field to be modified, in half-bytes (hexadeccimal)

Col.11-16 External symbol whose value is to be added to or subtracted from the indicated field

Note: control section name is automatically an external symbol, i.e. it is available for use in

Modification records.

Example Figure 2.17

M00000405+RDREC

M00000705+COPY

External References in Expression

Earlier definitions required all of the relative terms be paired in an expression (an absolute expression),

or that all except one be paired (a relative expression)

New restriction Both terms in each pair must be relative within the same control section Ex: BUFEND-BUFFER

Ex: RDREC-COPY

In general, the assembler cannot determine whether or not the expression is legal at assembly

time. This work will be handled by a linking loader.

Assembler Design Options OOnnee--ppaassss aasssseemmbblleerrss

MMuullttii--ppaassss aasssseemmbblleerrss

TTwwoo--ppaassss aasssseemmbblleerr wwiitthh oovveerrllaayy ssttrruuccttuurree

Two-Pass Assembler with Overlay Structure For small memory pass 1 and pass 2 are never required at the same time

three segments

root: driver program and shared tables and subroutines

pass 1

pass 2 tree structure

overlay program

One-Pass Assemblers

Main problem forward references

Page 28: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

28

data items

labels on instructions

Solution data items: require all such areas be defined before they are referenced

labels on instructions: no good solution

Main Problem forward reference

data items

labels on instructions

Two types of one-pass assembler load-and-go

produces object code directly in memory for immediate execution the other

produces usual kind of object code for later execution

Load-and-go Assembler

Characteristics Useful for program development and testing

Avoids the overhead of writing the object program out and reading it back

Both one-pass and two-pass assemblers can be designed as load-and-go.

However one-pass also avoids the over head of an additional pass over the source

program

For a load-and-go assembler, the actual address must be known at assembly time,

we can use an absolute program

Forward Reference in One-pass Assembler For any symbol that has not yet been defined

1. omit the address translation

2. insert the symbol into SYMTAB, and mark this symbol undefined

3. the address that refers to the undefined symbol is added to a list of forward references

associated with the symbol table entry

4. when the definition for a symbol is encountered, the proper address for the symbol is

then inserted into any instructions previous generated according to the forward reference

list

At the end of the program any SYMTAB entries that are still marked with * indicate undefined symbols

search SYMTAB for the symbol named in the END statement and jump to this

location to begin execution

Page 29: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

29

The actual starting address must be specified at assembly time Example

Producing Object Code

When external working-storage devices are not available or too slow (for the intermediate file

between the two passes

Solution: When definition of a symbol is encountered, the assembler must generate another Tex record with

the correct operand address

Page 30: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

30

The loader is used to complete forward references that could not be handled by the assembler

The object program records must be kept in their original order when they are presented to the

loader

Example

Multi-Pass Assemblers

Restriction on EQU and ORG no forward reference, since symbols’ value can’t be defined during the first pass

Example Use link list to keep track of whose value depend on an undefined symbol

Page 31: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

31

Page 32: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

32

Page 33: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

33

ASSEMBLERS

1. Introduction

There are two main classes of programming languages: high level (e.g., C, Pascal) and low

level. Assembly Language is a low level programming language. Programmers code symbolic

instructions, each of which generates machine instructions.

An assembler is a program that accepts as input an assembly language program (source) and

produces its machine language equivalent (object code) along with the information for the loader.

Assembly language program

Assembler Linker EXE

Figure 1. Executable program generation from an assembly source code

Advantages of coding in assembly language are:

Provides more control over handling particular hardware components

May generate smaller, more compact executable modules

Often results in faster execution

Disadvantages:

Not portable

More complex

Requires understanding of hardware details (interfaces)

Assembler:

An assembler does the following:

1. Generate machine instructions

- evaluate the mnemonics to produce their machine code

Page 34: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

34

- evaluate the symbols, literals, addresses to produce their equivalent machine addresses

- convert the data constants into their machine representations

2. Process pseudo operations

2. Two Pass Assembler

A two-pass assembler performs two sequential scans over the source code:

Pass 1: symbols and literals are defined

Pass 2: object program is generated

Parsing: moving in program lines to pull out op-codes and operands

Data Structures:

- Location counter (LC): points to the next location where the code will be placed

- Op-code translation table: contains symbolic instructions, their lengths and their op-codes (or

subroutine to use for translation)

- Symbol table (ST): contains labels and their values

- String storage buffer (SSB): contains ASCII characters for the strings

- Configuration table: contains pointer to the string in SSB and offset where its value will be inserted

in the object code

assembly machine

language Pass1 Pass 2 language

program Symbol table program

Configuration table

String storage buffer

Partially configured object file

Figure 2. A simple two pass assembler.

Page 35: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

35

Example 1: Decrement number 5 by 1 until it is equal to zero.

assembly language memory object code

program address in memory

-----------------------

START 0100H

LDA #5 0100 01

0101 00

0102 05

LOOP: SUB #1 0103 1D

0104 00

0105 01

COMP #0 0106 29

0107 00

0108 00

JGT LOOP 0109 34

010A 01 placed in Pass 1

010B 03

RSUB 010C 4C

010D 00

010E 00

END

Op-code Table

Mnemonic Addressing

mode

Opcode

LDA immediate 01

SUB immediate 1D

COMP immediate 29

LDX immediate 05

ADD indexed 18

TIX direct 2C

JLT direct 38

JGT direct 34

RSUB implied 4C

Symbol Table

Symbo

l

Value

LOOP 0103

Page 36: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

36

Example 2: Sum 6 elements of a list which starts at location 200.

assembly language memory object code

program address in memory

-----------------------

START 0100H

LDA #0 0100 01

0101 00

0102 00

LDX #0 0103 05

0104 00

0105 00

LOOP: ADD LIST, X 0106 18

0107 01 placed in Pass 2

0108 12

TIX COUNT 0109 2C

010A 01 placed in Pass 2

010B 15

JLT LOOP 010C 38

010D 01 placed in Pass 1

010E 06

RSUB 010F 4C

0110 00

0111 00

LIST: WORD 200 0112 00

0113 02

0114 00

COUNT: WORD 6 0115 00

0116 00

0117 06

END

Symbol Table

Symbo

l

Addres

s

LOOP 0106

Page 37: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

37

LIST 0112

COUN

T

0115

Configuration Table

Offset SSB pointer

for the

symbol

0007 DC00

000A DC05

SSB

DC00 4C

H

DC01 49H ASCII for L,I,S,T

DC02 53H

DC03 54H

DC04 5E

H

ASCII for separation

character

DC05

Pass1

All symbols are identified and put in ST

All op-codes are translated

Missing symbol values are marked

LC = origin

Read next statement

Parse the statement

Y

Comment

N

“END” Y Pass 2

Page 38: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

38

N

N pseudo-op Y what

kind?

N

Label EQU WORD/ RESW/RESB

BYTE

Y

N Label N Label

Enter label in ST Enter label in ST

Y Y

Enter label in ST Enter label in ST

Call translator

Place constant in

machine code

Advance LC by the

number of bytes specified

Advance LC in the pseudo-op

Figure 3. First pass of a two-pass assembler.

Page 39: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

39

Translator Routine

Find opcode and the number of bytes

in Op-code Table

Write opcode in machine code

Write the data or address that is known

at this time in machine code

more

information Y

will be needed Set up an entry in

in Pass 2 ? Configuration Table

N

Advance LC by the number of bytes

in op-code table

return

Figure 4. Flowchart of a translator routine

Pass 2

- Fills addresses and data that was unknown during Pass 1.

Pass 1

More lines in N

Configuration Done

Table

Y

Get the next line

Retrieve the name of the symbol from SSB

Get the value of the symbol from ST

Page 40: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

40

Compute the location in memory

where this value will be placed

(starting address + offset)

Place the symbol value at this location

Figure 5. Second pass of a two-pass assembler.

Relocatable Code

Producing an object code, which can be placed to any specific area in memory.

Direct Address Table (DAT): contains offset locations of all direct addresses in the program (e.g., 8080

instructions that specify direct addresses are LDA, STA, all conditional jumps...). To relocate the

program, the loader adds the loading point to all these locations.

assembly language program Assembler machine language program

and DAT

Figure 6. Assembler output for a relocatable code.

Example 3: Following relocatable object code and DAT are generated for Example 1.

assembly language memory object code

program address in memory

-----------------------

START

LDA #0 0000 01

0001 00

0002 00

LDX #0 0003 05

0004 00

0005 00

LOOP: ADD LIST, X 0006 18

0007 00

0008 12

TIX COUNT 0009 2C

000A 00

000B 15

JLT LOOP 000C 38

000D 00

000E 06

RSUB 000F 4C

0010 00

0011 00

LIST: WORD 200 0012 00

0013 02

Page 41: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

41

0014 00

COUNT: WORD 6 0015 00

0016 00

0017 06

END

DAT

0007

000A

000D

Forward and backward references in the machine code are generated relative to address 0000. To

relocate the code, the loader adds the new load-point to the references in the machine code which are

pointed by the DAT.

One-Pass Assemblers

Two methods can be used:

- Eliminating forward references

Either all labels used in forward references are defined in the source program before they are

referenced, or forward references to data items are prohibited.

- Generating the object code in memory

No object program is written out and no loader is needed. The program needs to be re-assembled

every time.

Multi-Pass Assemblers

Make as many passes as needed to process the definitions of symbols.

Example 3:

A EQU B

B EQU C

C DS 1

3 passes are required to find the address for A.

Page 42: UNIT II ASSEMBLERS - Notes  · PDF fileIntroduction to Assemblers ... SIC Instruction Format Opcode: 8 bits

42

Such references can also be solved in two passes: entering symbol definitions that

involve forward references in the symbol table. Symbol table also indicates which

symbols are dependent on the values of others.

Example 4:

A EQU B

B EQU D

C EQU D

D DS 1

At the end of Pass1:

Symbol Table

A &1 B 0

B &1 D A 0

C &1 D 0

D 200 B C 0

After evaluating dependencies:

Symbol Table

A 200 0

B 200 0

C 200 0

D 200 0