2 Assemblers

Unit 2 Assemblers

Structure:

2.1 Introduction

Objectives

2.2 Assembly Language

2.3 Basic Assembler Functions

2.4 Design Specification of Assembler

2.4.1 Data Structures

2.4.2 Pass1 & pass2 Assembler flow charts

2.5 Tasks of Assemblers

2.6 Translation of Assemblers

2.7 Examples: MASM Assembler and SPARC Assembler

2.8 Summary

2.9 Terminal Questions

2.1 Introduction

An assembler is a program (system software) which accepts assembly language program as input andproduces its equivalent machine language program as output along with information for the loader. Theinput to the assembler program is called the source program and the output is called the object program.

Fig. 2.0: Function of an Assembler

An Assembly Language is a machine dependent, low level programming language which is specific to acertain computer system (or a family of computer systems). Compared to the machine language of acomputer system, it provides three basic features which simplify programming:

1. Mnemonic operation codes: Use of mnemonic operation codes (also called mnemonic opcodes) formachine instructions eliminates the need to memorize numeric operation codes. It also enables theassembler to provide helpful diagnostics, for example indication of misspell operation codes.

2. Symbolic Operands: Symbolic names can be associated with data or instructions. These symbolic

names can be used as operands in assembly statement. The assembler performs memory bindings to thesenames; the programmer need not know any details of the memory bindings performed by the assembler.

1 of 12 8/7/2012 1:53 PM

3. Data Declarations: Data can be declared in a variety of notations, including the decimal notation. Thisavoids manual conversion of constants into their internal machine representation, for example, conversionof –5 into (11111010)2 or 10.5 into (41A80000)16.

Statement format

An assembly language statement has the following format:

[Label] <Opcode> <operand spec>[,<operand spec> ..}

where the notation [..] indicates that the enclosed specification is optional. If a label is specified in astatement, it is associated as a symbolic name with the memory word(s) generated for the statement.<operand spec> has the following syntax:

<symbolic name> [+<displacement>][(<index register>)]

Thus, some possible operand forms are: AREA, AREA+5, AREA (4), and AREA+5 (4). The firstspecification refers to the memory word with which the name AREA is associated. The secondspecification refers to the memory word 5 words away from the word with the name AREA. Here ‘5′ isthe displacement or offset from AREA. The third specification implies indexing with index register 4–thatis, the operand address is obtained by adding the contents of index register 4 to the address of AREA. Thelast specification is a combination of the previous two specifications.

Objectives:

At the end of this unit the students would be able to:

· Write Assembly Language programs

· Use basic assembler functions

· Understand the passes and tasks of an assembler

2.2 Assembly Language

In this language, each statement has two operands, the first operand is always a register which can be anyone of AREG, BREG, CREG and DREG. The second operand refers to a memory word using a symbolicname and an optional displacement. (Note that indexing is not permitted.)

Table 2.0: Mnemonic operation codes

Table 2.0 shows the lists of the mnemonic opcodes for machine instructions. The MOVE instructionsmove a value between a memory word and a register. In the MOVER instruction the second operand isthe source operand and the first operand is the target operand. Converse is true for the MOVEMinstruction. All arithmetic is performed in a register (i.e. the result, replaces the contents of a register) andsets a condition code. A comparison instruction sets a condition code analogous to a subtract instructionwithout affecting the values of its operands. The condition code can be tested by a Branch on Condition(BC) instruction. The assembly statement corresponding to it has the format

2 of 12 8/7/2012 1:53 PM

BC <condition code spec>, <memory address>

It transfers control to the memory word with the address <memory address> if the current value ofcondition code matches <condition code spec>. For simplicity, we assume <condition code spec> to be acharacter string with obvious meaning, e.g. GT, EG, etc. A BC statement with the condition code specANY implies unconditional transfer of control. In a machine language program, we show all addresses andconstants in decimal rather than in octal or hexadecimal.

Assembly Language Statements

An assembly program contains three kinds of statements:

1. Imperative statements

2. Declaration statements

3. Assembler directives.

Imperative statements

An imperative statement indicates an action to be performed during the execution of the assembledprogram. Each imperative statement typically translates into one machine instruction.

Declaration statements

The syntax of declaration statements is as follows:

[Label] DS <constant>

[Label] DC ‘<Value>’

<constant> ‘<value>’

The DS (short for declare storage) statement reserves areas of memory and associates names with them.Consider the following DS statements:

A DS 1

G DS 200

The first statement reserves a memory area of 1 word and associates the name A with it. The secondstatement reserves a block of 200 memory words. The name G is associated with the first word of theblock. Other words in the block can be accessed through offsets from G, e.g. G+5 is the sixth word of thememory block, etc.

The DC (short for declare constant) statement constructs memory words containing constants. Thestatement

ONE DC ‘1’

associates the name ONE with a memory word containing the value ‘ 1′. The programmer can declareconstants in different forms—decimal, binary, hexadecimal, etc. The assembler converts them to theappropriate internal form.

Assembler Directives

Assembler directives instruct the assembler to perform certain actions during the assembly of a program.>3 of 12 8/7/2012 1:53 PM

Some assembler directives are described in the following.

START <constant>

This directive indicates that the first word of the target program generated by the assembler should beplaced in the memory word with address <constant>.

END [<operand spec>]

This directive indicates .the end of the source program. The optional <operand spec> indicates theaddress of the instruction where the execution of the program should begin. (By default, execution beginswith the first instruction of the assembled program.).

Self Assessment Questions

1) What is assembler? Explain its basic functionality

2) What are different assembler directives?

3) Is assembler is required for developing software applications ? Give your comments.

2.3 Basic Assembler Functions

An assembler must does the following tasks.

1. Generate instructions

a. Evaluate the mnemonic in the operation field to produce its machine code.

b. Evaluate the sub field to find the value of each symbol, process literals and assign address.

2. Process pseudo-operations

We can group these tasks into two pass or sequential scans over the input, associated with task are one ormore assembler modules.

Necessity of Two passes for Assembler

Because symbols can appear before they are defined, it is convenient to make two passes over the input.The first pass is only to define the symbols; the second pass can then generate the instruction andaddresses.

Purposes of the pass 1 and pass 2

Pass 1: Purpose – Define Symbols and Literals

1. Determine length of machine instructions

2. Keep track of Location Counter (LC)

3. Remember values of symbols until pass 2

4. Process some pseudo-operation

Pass 2: Purpose – Generate object program

1. Look value of symbols

4 of 12 8/7/2012 1:53 PM

2. Generate instruction

3. Generate Date

4. Process pseudo-ops.


1) What are basic functions of assembler?

2.4 Design Specification of an Assembler

There are six steps to be followed in the design of assembler. They are:

1. Specify the problem.

2. Specify data structures

3. Define format of data structures.

4. Specify algorithm

5. Look for modularity. (capability of one program to be subdivided into independent programming units).

6. Repeat 1 through 5 on each module.

In the first step we have to specify the function the assembler has to perform. The second step specifiesthe data the assembler needs to perform in further operations. This will be stored in the form of tables,which is called as database (data structure). Thus the assembler makes use of the information, which ispresent in the database for further processing. In the third step we specify the structure or the way datahas to be stored in the database. It specifies the format of storing of data, and the contents of the database.The fourth step gives the algorithm, which has to be converted to program to get the result from theassembler. The fifth step is the step for dividing the program into sub problems, which enables the designerto write the assembler efficiently. Finally the same steps have to be repeated for the sub problems, whichhas been divided from the given program.

Specify the problem or Statement of problem

The fundamental information requirements arise in the synthesis phase of an assembler. Hence it is best tobegin by considering the information requirements of the synthesis tasks. We then consider how to makethis information available, i.e. whether it should be collected during analysis or derived during synthesis.

Consider the assembly statement

MOVER BREG, ONE

We must have the following information to synthesize the machine instruction corresponding to thisstatement:

1. Address of the memory word with which name ONE is associated,

2. Machine operation code corresponding to the mnemonic MOVER.

The first item of information depends on the source program. Hence it must be made available by theanalysis phase. The second item of information does not depend on the source program, it merely dependson the assembly language. Hence the synthesis phase can determine this information for itself.

2.4.1 Data Structures

5 of 12 8/7/2012 1:53 PM

The second step in our design procedure is to establish the databases that we have to work with.

Pass 1 Data Structures

1. Input source program

2. A Location Counter (LC), used to keep track of each instruction’s location.

3. A table, the Machine-operation Table (MOT), that indicates the symbolic mnemonic, for eachinstruction and its length (two, four, or six bytes)

4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to betaken for each pseudo-op in pass 1.

5. A table, the Symbol Table (ST), that is used to store each label and its corresponding value.

6. A table, the literal table (LT), that is used to store each literal encountered and its correspondingassignment location.

7. A copy of the input to be used by pass 2.

Pass 2 Data Structures

1. Copy of source program input to pass1.

2. Location Counter (LC)

3. A table, the Machine-operation Table (MOT), that indicates for each instruction, symbolic mnemonic,length (two, four, or six bytes), binary machine opcode and format of instruction.

4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to betaken for each pseudo-op inpass 2.

5. A table, the Symbol Table (ST), prepared by pass1, containing each label and corresponding value.

6. A Table, the base table (BT), that indicates which registers are currently specified as base registers byUSING pseudo-ops and what are the specified contents of these registers.

7. A work space INST, that is used to hold each instruction as its various parts are being assembledtogether.

8. A work space, PRINT LINE, used to produce a printed listing.

9. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructionsinto the format needed by the loader.

10. An output deck of assembled instructions in the format needed by the loader.

6 of 12 8/7/2012 1:53 PM

Fig. 2.1: Data structures of the Assembler

Format of Data Structures

The third step in our design procedure is to specify the format and content of each of the data structures.Pass 2 requires a machine operation table (MOT) containing the name, length, binary code and format;pass 1 requires only name and length. Instead of using two different table, we construct single (MOT). TheMachine operation table (MOT) and pseudo-operation table are example of fixed tables. The contents ofthese tables are not filled in or altered during the assembly process.

The following Table 2.1 shows the format of the machine-op table (MOT)

––––––––––––––––––––––––– 6 bytes per entry –––––––––––––––––––––––––––––––––

MnemonicOpcode (4bytes)characters

Binary opcode (1byte)(hexadecimal)

Instruction length

(2 bits) (binary)

Instructionformat

(3 bits) (binary)

Not usedhere

(3 bits)

“Abbb” 5A 10 001

“Ahbb” 4A 10 001

“ALbb” 5E 10 001

“ALRB” 1E 01 000

……. ……. ……. …….

‘b’ represents “blank”

2.4.2 The Flow Chart for Pass-1

The primary function performed by the analysis phase is the building of the symbol table. For this purposeit must determine the addresses with which the symbol names used in a program are associated. It ispossible to determine some address directly, e.g. the address of the first instruction in the program,

7 of 12 8/7/2012 1:53 PM

however others must be inferred.

To implement memory allocation a data structure called location counter (LC) is introduced. The locationcounter is always made to contain the address of the next memory word in the target program. It isinitialized to the constant. Whenever the analysis phase sees a label in an assembly statement, it enters thelabel and the contents of LC in a new entry of the symbol table. It then finds the number of memorywords required by the assembly statement and updates; the LC contents. This ensure: that LC points tothe next memory word in the target program even when machine instructions have different lengths andDS/DC statements reserve different amounts of memory. To update the contents of LC, analysis phaseneeds to know lengths of different instructions. This information simply depends on the assembly languagehence the mnemonics table can be extended to include this information in a new field called length. Werefer to the processing involved in maintaining the location counter as LC processing.


1. What do you mean by pass1 and pass2 Assemblers

2. What are design specification of an Assembler

3. Write any Assembly Language Program of your choice.

2.5 Tasks of Assemblers

Tasks performed by the passes of a Two Pass assembler are as follows:

Pass I

1. Separate the symbol, mnemonic opcode and operand fields.

2. Build the symbol table.

3. Perform LC processing.

4. Construct intermediate representation.

Pass II: Synthesize the target program.

Pass I performs analysis of the source program and synthesis of the intermediate representation while PassII processes the intermediate representation to synthesize the target program. The design details ofassembler passes are discussed after introducing advanced assembler directives and their influence on LCprocessing.

8 of 12 8/7/2012 1:53 PM

2.6 Translation of Assemblers

Here we discuss two pass and single pass assembly schemes in this section:

Two Pass Translation

Two pass translation of an assembly language program can handle forward references easily. LCprocessing is performed in the first pass and symbols defined in the program are entered into the symboltable. The second pass synthesizes the target form using the address information found in the symboltable. In effect, the first pass performs analysis of the source program while the second pass performssynthesis of the target program. The first pass constructs an intermediate representation (IR) of the sourceprogram for use by the second pass. This representation consists of two main components – datastructures, e.g. the symbol table, and a processed form of the source program. The latter component iscalled intermediate code (IC).

Single Pass Translation

LC processing and construction of the symbol table proceed as in two pass translation. The problem offorward references is tackled using a process called backpatch-ing. The operand field of an instructioncontaining a forward reference is left blank initially. The address of the forward referenced symbol is putinto this field when its definition is encountered.

Table 2.2 instructions corresponding to the statement MOVER BREG, ONE

START 101

READ N 101) + 09 0 113

MOVER BREG, ONE 102) + 04 2 115

MOVEM BREG, TERM 103) + 05 2 116

AGAIN MULT BREG, TERM 104) + 03 2 116

MOVER CREG, TERM 105) + 04 3 116

ADD CREG, ONE 106) + 01 3 115

MOVEM CREG, TERM 107) + 05 3 116

COMP CREG, N 108) + 06 3 113

BC LE, AGAIN 109) + 07 2 104

MOVEM BREG, RESULT 110) + 05 2 114

9 of 12 8/7/2012 1:53 PM

PRINT RESULT 111) + 10 0 114

STOP

112) + 00 0 000

N DS 1 113)

RESULT DS 1 114)

ONE DC ‘1’ 115) + 00 0 001

TERM PS 1 116)

END

can be only partially synthesized since ONE is a forward reference. Hence the instruction opcode andaddress of BREG will be assembled to reside in location 101. The need for inserting the second operand’saddress at a later stage can be indicated by adding an entry to the Table of Incomplete Instructions (TII).This entry is a pair (instruction address>, <symbol>), e.g. (101, ONE) in this case.

By the time the END statement is processed, the symbol table would contain the addresses of all symbolsdefined in the source program and TII would contain information describing all forward references. Theassembler can now process each entry in TII to complete the concerned instruction. For example, theentry (101, ONE) would be processed by obtaining the address of ONE from symbol table and inserting itin the operand address field of the instruction with assembled address 101. Alternatively, entries in TII canbe processed in an incremental manner. Thus, when definition of some symbol symb is encountered, allforward references to symb can be processed.

2.7 MASM Assembler and SPARC Assemblers

MASM: Microsoft Macro Assembler

The Microsoft Macro Assembler (MASM) is an assembler for the x86 family of microprocessors,originally produced Microsoft MS-DOS operating system. It supported a wide variety of macro facilitiesand structured programming idioms, including high-level constructions for looping, procedure calls andalternation (therefore, MASM is an example of a high-level assembler). Later versions added thecapability of producing programs for the Windows operating systems that were released to follow on fromMS-DOS. MASM is one of the few Microsoft development tools for which there was no separate 16-bitand 32-bit version.

Assembler affords the programmer looking for additional performance a three-pronged approach toperformance based solutions. MASM can build very small high performance executable files that are wellsuited where size and speed matter. When additional performance is required for other languages, MASMcan enhance the performance of these languages with small fast and powerful dynamic link libraries. Forprogrammers who work in Microsoft Visual C/C++, MASM builds modules and libraries that are in thesame format so the C/C++ programmer can build modules or libraries in MASM and directly link theminto their own C/C++ programs.

This allows the C/C++ programmer to target critical areas of their code in a very efficient and convenientmanner, graphics manipulation, games, very high speed data manipulation and processing, parsing at

10 of 12 8/7/2012 1:53 PM

speeds that most programmers have never seen, encryption, compression and any other form ofinformation processing that is processor intensive.

For programmers who are not familiar with 32 bit Windows assembler, there is speed and performanceavailable that you may never have seen before and contrary to popular legend, if you can write aWindows application in C/C++, Basic, Pascal or other similar compiler based languages, you can write itin MASM with very similar looking code if you bother to learn the MASM high level syntax.

MASM32 has been designed to be familiar to programmers who have already written API based code inWindows. The invoke syntax of MASM allows functions to be called in much the same way as they arecalled in a high level compiler.

The traditional Notation for calling a function is as follows,

Push par4push par3 push par2 push par1 call FunctionName mov retval, eax.

SPARC Assembler

SPARC (which stands for Scalable Processor ARChitecture) is an open set of technical specifications thatany person or company can license and use to develop microprocessors and other semiconductor devicesbased on published industry standards. SPARC was invented in the labs of Sun Microsystems Inc. basedupon pioneering research into Reduced Instruction Set Computing (RISC) at the University of Californiaat Berkeley. The first standard product based on the SPARC architecture was produced by Sun and Fujitsuin 1986; Sun followed in 1987 with its first workstation based on a SPARC processor. In 1989, SunMicrosystems transferred ownership of the SPARC specifications to an independent, non-profitorganization, SPARC International, which administers and licenses the technology and providescompliance testing and other services for its members. SPARC is a modern, fast, pipelined architecture. Itsassembly language illustrates most of the features found in assembly languages for the variety of computerarchitectures which have been developed..

2.8 Summary

The current chapter highlighted the assemblers and their potentials. An assembler is a program (systemsoftware) which accepts assembly language program as input and produces its equivalent machinelanguage program as output along with information for the loader. The input to the assembler program iscalled the source program and the output is called the object program. Assembler can be implemented intwo passes one pass1 and other one is pass2 assembler. Corresponding flow charts are given in the section3.5.students can through the flow charts. We have also discussed pass structures and two pass assemblersand their details. Finally, we took examples from SPARC and MASM. These are two assemblers arepopular ones in the market. SPARC is a SUN product and MASM is Microsoft product.

2.9 Terminal Questions

1) What are pass1 and pass2 Assembler? Write their data structures.

2) Draw the flow chart of pass1 assemblers.

3) What is MSAM? Explain its features.

4) Write the Pass1 and Pass2 data structures in detail.

11 of 12 8/7/2012 1:53 PM

2 of 12 8/7/2012 1:53 PM

2 Assemblers

Documents

Transcript of 2 Assemblers