52.223 Low Level Programming Lecturer: Duncan Smeed

32
52.223 Low Level Programming Lecturer: Duncan Smeed Overview of IA-32 Assembly Language Programming Part 1

description

52.223 Low Level Programming Lecturer: Duncan Smeed. Overview of IA-32 Assembly Language Programming Part 1. Program Translation Hierarchy. Assembly Language Programming level. }. An Assembly Language Program: Global View. - PowerPoint PPT Presentation

Transcript of 52.223 Low Level Programming Lecturer: Duncan Smeed

52.223 Low Level Programming Lecturer: Duncan Smeed

Overview of IA-32 Assembly Language Programming

Part 1

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/2

Program Translation Hierarchy

Assembly Language

Programming level}

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/3

An Assembly Language Program: Global View

Typically, an Assembly Language Program (ALP) is divided into three sections that specify the main components of a program. In some cases these sections can be inter-mixed to provide for better design and structure. These section are:

• Assembler Directives (aka Pseudo-ops)

• Assembly Language Instructions

• Data Storage Directives

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/4

Assembler Directives (Pseudo-ops)

These are directives supplied by the user to the assembler for defining data and symbols, setting assembler and linking conditions, and specifying output formats, etc. The directives do not produce machine code. Examples:DOSSEG - Specifies a standard segment order for the code,

data and stack segments.

PROC - Identifies the first executable instruction: the program entry point.

END - Program End. This informs the assembler that the program source is finished.

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/5

Assembly Language Instructions

These are the actual IA-32 instructions that are translated into executable machine code. Examples:MOV [operands] ; to move data, i.e. memory to register

ADD [operands] ; to add two data values

AND [operands] ; to logically AND two data values

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/6

Data Storage Directives

Also known as Data Definition Directives These allocate data storage locations containing

initialized or uninitialized data. Examples:db "Good afternoon”,0

db 20 dup(0) ; 20 bytes, all zeroed

db 20 dup(?) ; 20 uninitialised bytes

dw ?,?,?,?,? ; 5 uninitialised words

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/7

Format of Assembly Language Statements

In general an assembly language (AL) statement can contain up to four fields. Namely:

[name] [mnemonic] [operand(s)] [comment] name identifies a label, variable, constant (symbol) or

keyword. mnemonic identifies the AL instruction (opcode) or an

assembler directive. operand(s) identifies the operand(s) for the mnemonic. comment signifies AL commentary/documentation.

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/8

[name]

This field identifies a label, variable, constant or keyword. • Label - When a name appears next to a program instruction, it is called

a label. Labels serve as place markers to be used as, for example, an address reference in a jump instruction: jmp endif_01

• Variable - A name used before a data allocation directive identifies a location where data resides in memory. E.g.:Count1 db 50 ; the variable count1

• Constant - A name used to define a constant. E.g.:max_col equ 80 ; the constant max_col

• Keyword - A keyword, or reserved word, has some predefined meaning to the assembler. It may be an instruction mnemonic or an assembler directive. Keywords cannot be used out of context or as identifiers:

add mov ax,10 ; illegal use of add as label

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/9

[mnemonic]

This field contains the mnemonic of:• an instruction opcode (e.g. MOV, ADD) or,• a pseudo-op (e.g. DB, EQU)

To distinguish labelled statements from unlabelled ones the mnemonic field of an unlabelled statement must (depending on assembler) either:• not start in the first column since that’s where labels start,• or labels must have an identifying character - often a ‘:’

suffix - to differentiate them from other fields. E.g. the following code uses both types of formatting for illustration (but note most assemblers use just one style or the other):

jmp endif_01 ; ‘tabbed in’ statementelse_01: mov ax,10 ; ‘suffix :’ style

labelendif_01 add dx,ax ; ‘column 1’ style label

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/10

[operand(s)]

For those instructions or pseudo-ops that require operands then this field contains one or more operands separated - typically - by commas (e.g. registers or addresses of data to be operated upon by the instruction in the mnemonic (op-code) field. Examples:

ax‘A’ax,100[200],bxdx,[bx][bx+si],cxax,[bx+si+2]

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/11

[comment]

The remainder of the statement is the comment field. Some assemblers require this field to start with a

special character, such as ';’ or ‘#’. Comments in the program are for documentation

purposes only and are ignored by the assembler. Such comments are absolutely vital when

programming in AL since there is such a large semantic gap between the design of a program/algorithm at a high level and its implementation at such a low level.

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/12

Comment-only Statements

The exception to the format of [name] [mnemonic] [operand(s)] [comment] is that if a line starts with a special comment-line character then the whole line is treated as a comment:; This is an example of a comment-only line. If you ever

; write AL programs then such comment lines should

; ideally outnumber code lines by a significant factor!

; IOW, AL is a write-only language ;-)

; Incidentally, the following in-line comment is almost

; worthless!!:

mov ax,10 ; move the value 10 into AX

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/13

Field Separators

In general, the fields are separated by spaces and if the label field is NOT present it must be replaced by at least one space. To improve the appearance of the program it is wise to position the fields at particular column positions (e.g. at tab stops). For example, contrast the following two programs - one with an untidy layout and the other with a neat layout.

;2) Neatly laid out; example program

mov ax,[150] ; blahmov bx,[152] ; blah blahadd ax,2 ; wibblemov [154],ax ; ...wibblemov [150],bx ; blahint 20 ; End program

;1) Untidily laid out; example program mov ax,[150] mov bx,[152] add ax,2 mov [154],ax mov [150], bx int 20

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/14

Data Definition Directives Revisited

Variables are really just symbolic names for locations in memory where data is stored. In assembly language, (global) variables are identified by labels.

A label does not, however, indicate how many bytes of storage are allocated to a variable - it is, in effect, the address of the first byte of a data structure.

The following syntax diagram shows that label is optional, and only one intialvalue is required. If more are supplied, they must be separated by commas:

[label] <directive> initialvalue [,initialvalue]

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/15

…Data Definition Directives

Data definition directives are used to allocate storage and include the following pre-defined types:

Directive Defines Bytes

DB Byte 1

DW Word 2

DD Doubleword 4

DQ Quadword 8

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/16

DB - Define Byte

The DB directive allocates storage for one or more 8-bit values: [label] DB initialvalue [,initialvalue]

Initialvalue can be one or more 8-bit values, a string constant, a constant expression (evaluated at assembly time), or a question mark (?). If the value is signed, it has the range -128 to +127; if unsigned, the range is 0 to 255. Here are a few examples:char db 'A' ; ASCII charactermin_s db -128 ; min. signed valuemax_s db +127 ; max. signed valuemin_u db 0 ; min. unsigned valuemax_u db 255 ; max. unsigned value

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/17

… DB - Define Byte

Each value may also be expressed in a different radix. For example, the following variables all contain exactly the same value. Which radix to use is entirely up to the programmer but is usually chosen to reinforce the context of its use. I.e. if a value is to be treated in a 'character' context then the definition reflects that. Thus:char_version db 'A' ; ASCII characterhex_version db 41h ; as hexadecimaldec_version db 65 ; as decimalbin_version db 01000001b ; as binaryoct_version db 101q ; as octal

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/18

… DB - Define Byte

A list of values may be grouped under a single label, with the values separated by commas. In the following example, list1 and list2 have the same contents:list1 db 10, 32, 41h,001000010blist2 db 0Ah,20h,'A',22h

A variable contents may be left undefined by using the question mark (?) operator. Or a numeric expression can initialise a variable with a value that is calculated at assembly time. Examples:count db ?ages db ?,?,?,?,?scrn_size db 80*24

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/19

…DB - Define Byte

A string may be assigned to a variable, in which case the variable (label) stands for the address of the first byte.C_string db "Good morning",0pascal_string db 12,"Good morning"

Long strings can be made more readable in an AL source program by continuing them over multiple lines without the necessity of supplying a label for each. The following string is terminated by an end-of-line sequence and a null byte:a_long_string db "This is a string "db "that clearly is going to take "db "several lines to store in an "db "assembly language program."db 0Dh,0Ah,0 ; EOL sequence + NULL

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/20

$ Operator

The assembler can automatically calculate the length of a string by making use of the $ operator which represents the assembler's current location counter value. In the following example, a_string_len is initialised to 16:a_string db "This is a string"

a_string_len db $-a_string

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/21

DW - Define Word

The DW directive creates storage for one or more 16-bit words. The syntax is:

[label] DW initialvalue [,initialvalue]

Initialvalue can be any 16-bit value from 0 to 65,535 (FFFFh) or -32,768 (8000h) to +32,767 (7FFFh) if signed, a constant expression (evaluated at assembly time), or a question mark (?) to leave a variable uninitialised.

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/22

DW and Near Pointers

The offset of a variable or subroutine may be stored in another variable. In the next example, the assembler sets listPtr to the offset of list. Then listPtrPtr contains the address of listPtr. Finally, aProcPtr contains the offset of a label called clear_screen.

list dw 256,257,258,259

listPtr dw list

listPtrPtr dw listPtr

aProcPtr dw clear_screen

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/23

DD - Define Doubleword

The DD directive creates storage for one or more 32-bit doublewords. The syntax is:

[label] DD initialvalue [,initialvalue]

Initialvalue can be any 32-bit value up to FFFFFFFFh, a segment-offset address, a 4-byte encoded real number, or a decimal real number. The bytes are stored in little-endian format, i.e. the value 12345678h would be stored in memory as:

memory address (offset): 00 01 02 03

contents: 78 56 34 12

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/24

…DD - Define Doubleword

You can define either a single doubleword or a list of doublewords. In the example that follows, far_pointer1 is uninitialised and the assembler automatically initialises far_pointer2 to the 32-bit segment-offset address of subroutine1:signed_val dd -2147483648far_pointer1 dd ?far_pointer2 dd subroutine1

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/25

DUP Operator

The DUP operator only appears after a storage allocation directive (DB, DW,...). DUP allows for the repetition of one or more values when allocating storage. This is especially useful when allocating space for a table or array. For example:db 20 dup(0) ; 20 bytes, all zeroeddb 20 dup(?) ; 20 uninitialised bytesdb 4 dup('ABC') ; 12 bytes: 'ABCABCABCABC'

The DUP operator may also be nested. The first example below creates storage containing (in ASCII) 000XX000XX. The second example creates a 2-dimensional word table of 3 rows by 4 columns:aTable db 4 dup( 3 dup('0'), 2 dup('X') )anArray dw 3 dup( 4 dup(0) )

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/26

Type Checking

When a variable is created using DB, DW, etc., the assembler gives it a default attribute (byte, word, etc.) based on its size. This type is checked on referencing the variable and an error results if the types do not match. So:

count dw 20h

...

mov al,count ;error: type mismatch

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/27

…Type Checking

To overcome type checks requires the use of a LABEL directive to create a new name (and associated type) at the same address. Thus:

count_lo label byte ; byte attribute

count dw 20h ; word attribute

...

mov al,count_lo ; use low byte of count

mov cx,count ; use all of count

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/28

Addressing Modes Revisited

As we have seen an instruction consists ofa) the op-code that tells the process what instruction to

perform and,b) the operand or address field which tells the processor where

to find that data to be operated upon. This address is known as the Effective Address (EA).

To determine the EA, the processor uses one of a number of addressing modes that are defined by the operand field of the instruction. Getting the EA from the addressing mode may be quite simple (e.g. the operand is [the contents of] a data register) or complex (e.g. the operand is in memory, the address of which is contained in an address register). [See 52223_02/16-34 for details of the IA-32 AMs.]

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/29

Aside: Lecture Notes Archive

Further examples of AMs, etc., of the IA-16 subset of IA-32 can be found in my lecture notes archive at:

<http://www.cis.strath.ac.uk/~dunc/

cdrom/archives/ay2000/teaching/llp/lectures/odd/part2.html>

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/30

DEBUG

DEBUG is included as part of the standard Windows installation.

DEBUG is a DOS-mode debugger, which means:• It’s of no use for debugging Win 32 applications

• But it is useful to explore the wonderful(!) world of the IA-16 (real mode) subset of IA-32.

An overview of DEBUG can be found in my lecture notes archive at:

<http://www.cis.strath.ac.uk/~dunc/

cdrom/archives/ay2000/teaching/llp/practicals/debug.html>

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/31

H:/llp/p1>debugStatement Comment

A 150 Assemble data at offset 150

db 10,20,30,0 1st 3 bytes are array, last is sum

<ENTER> ends assembly

A 100 Assembly code at offset 100h

mov bx,150 BX points to the array

mov si,2 SI will be an index

mov al,[bx] Indirect operand

add al,[bx+1] Base-offset operand

add al,[bx+si] Base-indexed operand

mov [153],al Direct operand

int 20 End program

<ENTER> ends assembly

T Trace each instruction

… [trace output appears…]

D 150,153 Dump array and sum

Overview of IA-32 Assembly Language Programming - Part 152223_ALP/32

References & Bibliography

Duncan’s Archived 52.223 Lecture Notes<http://www.cis.strath.ac.uk/~dunc/cdrom/archives/ay2000/teaching/llp/>

sandpile.org -- IA-32 architecture<http://www.sandpile.org/ia32/index.htm>

PC Assembly Language<http://www.drpaulcarter.com/pcasm/>

Linux Assembly HOWTO<http://www.faqs.org/docs/Linux-HOWTO/Assembly-HOWTO.html>

Inline Assembly with DJGPP<http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html>

docs.sun.com: IA-32 Assembly Language Reference Manual<http://docs.sun.com/app/docs/doc/806-3773/6jct9o0ad?a=view>

Pentium Assembly Code Using gcc<http://william.krieger.faculty.noctrl.edu/archive/c2003_09_csc220/assembly/>

Microsoft Windows XP - Debug<http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/debug.mspx>