BT 0068 Computer Organization and Architecture...

BT 0068

Computer Organization and Architecture

Contents

Unit 1

Data Representation in Computers 1

Unit 2

Register Transfer and Microoperations 37

Unit 3

Basic Structure of a Digital Computer 73

Unit 4

CPU and Register Organization 99

Unit 5

Interconnection Structures 122

Unit 6

Instruction Sets: Addressing Modes and Formats 139

Unit 7

Arithmetic Logic Unit 173

Unit 8

Binary Arithmetic 183

Unit 9

Memory Unit – Part I 220

Unit 10

Memory Unit – Part II 245

Unit 11

Input / Output Basics 273

Unit 12

Direct Memory Access 301

References 318

Prof. V. B. Nanda Gopal Director & Dean Directorate of Distance Education Sikkim Manipal University of Health, Medical & Technological Sciences (SMU DDE)

Board of Studies Dr. U. B. Pavanaja (Chairman) Nirmal Kumar Nigam General Manager – Academics HOP- IT Manipal Universal Learning Pvt. Ltd. Sikkim Manipal University – DDE Bangalore. Manipal. Prof. Bhushan Patwardhan Dr. A. Kumaran Chief Academics Research Manager (Multilingual) Manipal Education Microsoft Research Labs India Bangalore. Bangalore. Dr. Harishchandra Hebbar Ravindranath.P. S. Director Director (Quality) Manipal Centre for Info. Sciences. Yahoo India Manipal. Bangalore. Dr. N. V. Subba Reddy Dr. Ashok Kallarakkal HOD-CSE Vice President Manipal Institute of Technology IBM India Manipal. Bangalore. Dr. Ashok Hegde H. Hiriyannaiah Vice President Group Manager MindTree Consulting Ltd EDS Mphasis Bangalore. Bangalore. Dr. Ramprasad Varadachar Director, Computer Studies Dayanand Sagar College of Engg. Bangalore.

Content Preparation Team Content Writing Content Editing Mr. Balasubramani R Dr. E. R. Naganathan Assistant Professor, Dept. of IT Professor & HOD – IT Sikkim Manipal University – DDE Sikkim Manipal University – DDE Manipal. Manipal. Language Editing Ms. Chandrika P.S. HOD – English Sharada P.U. College, Mangalore

Edition: Spring 2009

This book is a distance education module comprising a collection of learning material for our students. All rights reserved. No part of this work may be reproduced in any form by any means without permission in writing from Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim. Printed and published on behalf of Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim by Mr.Rajkumar Mascreen, GM, Manipal Universal Learning Pvt. Ltd., Manipal – 576 104. Printed at Manipal Press Limited, Manipal.

The goal of this book is to introduce the basic concepts of computer

architecture and organization, in order to allow computer scientists to

recognize when programs are not as efficient as they could be and to

transform them so that they make better use of the underlying machine.

Another, closely related, goal is to provide the necessary background in

computer architecture to evaluate competing algorithms to decide which is

likely to be the most efficient for a given machine, even before they are

expressed in a programming language.

Unit 1: This unit discusses basic building blocks of a digital computer and

different data representation in computers. This unit also briefly explains

different codes.

Unit 2: This unit explains the Register Transfer Language. It also gives an

overview of bus and memory transfers. It also explains different

microoperations in detail.

Unit 3: This unit discusses Von-Neumann architecture of computer. It also

provides input on early evolution of computers as well.

Unit 4: This unit provides a detailed coverage on register organization of a

basic computer with special reference to register organization of Intel 8085

microprocessor and Motorola and Zilog machines.

Unit 5: This unit discusses the data transfers between the different modules

of a system like I/O module, Memory module etc. It explains the structure,

elements and functions of an interconnection entity which is called as Bus.

Here we also study different structure of CPU with single bus or two bus

systems.

SUBJECT INTRODUCTION SUBJECT INTRODUCTION

Unit 6: This unit explains instruction characteristics, type and different data

types that are supported by different machines like VAX, IBM etc. Then it

covers different operations like data transfer, arithmetic and logical

operations and transfer of control operations, with examples. It also covers

different modes of addressing with the formats and lengths of the

instruction. Finally it explains the stacks and subroutines that are necessary

for execution of certain instructions like transfer of control etc.

Unit 7: This unit discusses the main entity of the CPU that is arithmetic logic

unit. It also introduces different number representations.

Unit 8: This unit discusses different operations like addition, subtraction,

multiplication and division of numbers.

Unit 9: This unit discusses the Memory Unit, dealing with internal and

external memory. It also deals with the system memory considerations,

structures and principle of the cache memory.

Unit 10: We touch upon, in brief, the virtual memory and memory

management in operating system as well.

Unit 11: This unit discusses the Input Output that is responsible for

interaction with the peripherals. This unit also discusses the external

devices and different I/O functions. It explains in detail the programmed and

interrupts driven I/O techniques.

Unit 12: This unit touches upon the basic understanding of direct memory

access. Here we also give the detailed working of DMA controller and its

synchronization requirements with interrupts.

Computer Organization and Architecture Unit 1

Sikkim Manipal University - DDE Page No. 1

Unit 1 Data Representation in Computers

Structure:

1.1 Introduction

Objectives

1.2 Digital Computers

1.3 Data Types

1.4 Complements

1.5 Fixed-Point Representation

1.6 Floating-Point Representation

1.7 Other Binary Codes

1.8 Summary

1.9 Terminal Questions

1.10 Answers

1.1 Introduction

The digital computer is a digital system that performs various computational

tasks. The word digital implies that the information in the computer is

represented by variables that take a limited number of discrete values.

These values are processed internally by components that can maintain a

limited number of discrete states. The decimal digits 0, 1, 2, 9, for example,

provide 10 discrete values. The first electronic digital computers, developed

in the late 1940s, were used primarily for numerical computations. In this

case the discrete elements are the digits. From this application the term

digital computer has emerged. In practice, digital computers function more

reliably only if two states are used. Because of the physical restriction of

components, and because human logic tends to be binary (i.e. true-or-false,

yes-or-no statements), digital components that are constrained to take

discrete values are further constrained to take only two values and are said

to be binary.



Objectives:

After studying this unit, the learner will be able to

Explain various units of a digital computer

Understand different data types

Explain fixed and floating point number representation

Discuss various binary and error detection codes

1.2 Digital Computers

Digital computers use the binary number system, which has two digits:

0 and 1. A binary digit is called a bit. Information is represented in digital

computers in groups of bits. By using various coding techniques, groups of

bits can be made to represent not only binary numbers but also other

discrete symbols, such as decimal digits or letters of the alphabet. By the

judicious use of binary arrangements and by using various coding

techniques, the groups of bits are used to develop complete sets of

instructions for performing various types of computations.

In contrast to the common decimal numbers that employ the base 10

system, binary numbers use a base 2 system with two digits: 0 and 1. The

decimal equivalent of a binary number can be found by expanding it into a

power series with a base of 2. For example, the binary number 1001011

represents a quantity that can be converted to a decimal number by

multiplying each bit by the base 2 raised to an integer power as follows:

1 x 26 + 0 x 25 + 0 x 24 + 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20 = 75

The seven bits 1001011 represent a binary number whose decimal

equivalent is 75. However, this same group of seven bits represents the

letter K when used in conjunction with a binary code for the letters of the

alphabet. It may also represent a control code for specifying some decision

logic in a particular digital computer. In other words, groups of bits in a



digital computer are used to represent many different things. This is similar

to the concept that the same letters of an alphabet are used to construct

different languages, such as English and French.

A computer system is sometimes subdivided into two functional entities:

hardware and software. The hardware of the computer consists of all the

electronic components and electromechanical devices that comprise the

physical entity of the device. Computer software consists of the instructions

and the data that the computer manipulates to perform various data-

processing tasks. A sequence of instructions for the computer is called a

program. The data that are manipulated by the program constitute the data

base.

A computer system is composed of its hardware and the system software

available for its use. The system software of a computer consists of a

collection of programs whose purpose is to make more effective use of the

computer. The programs included in a systems software package are

referred to as the operating system. They are distinguished from

application programs written by the user for the purpose of solving particular

problems. For example, a high-level language program written by a user to

solve particular data-processing needs is an application program, but the

compiler that translates the high-level language program to machine

language is a system program. The customer who buys a computer

system would need, in addition to the hardware, any available software

needed for effective operation of the computer. The system software is an

indispensable part of a total computer system. Its function is to compensate

for the differences that exist between user needs and the capability of the

hardware.

The hardware of the computer is usually divided into three major parts as

shown in Fig. 1.1.



Fig. 1.1: Block diagram of a digital computer

The Central Processing Unit (CPU) contains an Arithmetic and Logic

Unit (ALU) for manipulating data, a number of registers for storing data, and

control circuits for fetching and executing instructions. The memory of a

computer contains storage for instructions and data. It is called a Random

Access Memory (RAM) because the CPU can access any location in

memory at random and retrieve the binary information within a fixed interval

of time. The Input-Output Processor (IOP) contains electronic circuits for

communicating and controlling the transfer of information between the

computer and the outside world. The input and output devices connected to

the computer include keyboards, printers, terminals, magnetic disk drives,

and other communication devices.

This book provides the basic knowledge necessary to understand the

hardware operations of a computer system. The subject is sometimes

considered from three different points of view, depending on the interest of

the investigator. When dealing with computer hardware it is customary to

Random Access Memory (RAM)

Central Processing Unit (CPU)

Input Output Processor (IOP)

Input devices

Output devices



distinguish between what is referred to as computer organization, computer

design, and computer architecture.

Computer organization is concerned with the way the hardware

components operate and the way they are connected together to form the

computer system. The various components are assumed to be in place and

the task is to investigate the organizational structure to verify that the

computer parts operate as intended.

Computer design is concerned with the hardware design of the computer.

Once the computer specifications are formulated, it is the task of the

designer to develop hardware for the system. Computer design is

concerned with the determination of what hardware should be used and how

the parts should be connected. This aspect of computer hardware is

sometimes referred to as computer implementation.

Computer architecture is concerned with the structure and behavior of the

computers as seen by the user. It includes the information formats, the

instruction set, and techniques for addressing memory. The architectural

design of a computer system is concerned with the specifications of the

various functional modules, such as processors and memories, and

structuring them together into a computer system.

1.3 Data Types

Binary information in digital computers is stored in memory or processor

registers. Registers contain either data or control information. Control

information is a bit or a group of bits used to specify the sequence of

command signals needed for manipulation of the data in other registers.

Data are numbers and other binary-coded information that are operated on,

to achieve required computational results.



The data types found in the registers of digital computers may be classified

as being one of the following categories: (1) numbers used in arithmetic

computations, (2) letters of the alphabet used in data processing, and

(3) other discrete symbols used for specific purposes. All types of data,

except binary numbers, are represented in computer registers in binary-

coded form. This is because registers are made up of flip-flops and flip-

flops are two-state devices that can store only 1’s and 0’s. The binary

number system is the most natural system to be used in a digital computer.

But sometimes it is convenient to employ different number systems,

especially the decimal number system, since it is used by people to perform

arithmetic computations.

Number Systems

A number system of base, or radix, r is a system that uses distinct symbols

for r digits. Numbers are represented by a string of digit symbols. To

determine the quantity that the number represents, it is necessary to

multiply each digit by an integer power of r and then form the sum of all

weighted digits. For example, the decimal number system in everyday

use employs the radix 10 system. The 10 symbols are 0, 1, 2, 3, 4, 5, 6, 7,

8, and 9. The string of digits 724.5 is interpreted to represent the quantity

7 x 102 + 2 x 101 + 4 x 100 + 5 x 10-1

that is, 7 hundreds, plus 2 tens, plus 4 units, plus 5 tenths. Every decimal

number can be similarly interpreted to find the quantity it represents.

The binary number system uses the radix 2. The two digit symbols used

are 0 and 1. The string of digits 101101 is interpreted to represent the

quantity

1 x 25 + 0 x 24 + 1 x 23 + 1 x 22 + 0 x 21 + 1 x 20 = 45



To distinguish between different radix numbers, the digits will be enclosed in

parentheses and the radix of the number inserted as a subscript. For

example, to show the equality between decimal and binary forty-five we will

write (101101)2 = (45)10.

Besides the decimal and binary number systems, the octal (radix 8) and

hexadecimal (radix 16) are important in digital computer work. The eight

symbols of the octal system are 0, 1, 2, 3, 4, 5, 6, and 7. The 16 symbols of

the hexadecimal system are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.

The last six symbols are, unfortunately, identical to the letters of the

alphabet and can cause confusion at times. However, this is the convention

that has been adopted. When used to represent hexadecimal digits, the

symbols A, B, C, D, E, F correspond to the decimal numbers 10, 11, 12, 13,

14, 15, respectively.

A number in radix r can be converted into the familiar decimal system by

forming the sum of the weighted digits. For example, octal 736.4 is

converted to decimal as follows:

(736.4)8 = 7 x 82 + 3 x 81 + 6 x 80 + 4 x 8-1

= 7 x 64 + 3 x 8 + 6 x 1 + 4 / 8 = (478.5)10

The equivalent decimal number of hexadecimal F3 is obtained from the

following calculation:

(F3)16 = F x 16 + 3 = 15 x 16 + 3 = (243)10

Conversion from decimal to its equivalent representation in the radix r

system is carried out by separating the number into its integer and fraction

parts and converting each part separately. The conversion of a decimal

integer into a base r representation is done by successive divisions by r and

accumulation of the remainders. The conversion of a decimal fraction to

radix r representation is accomplished by successive multiplications by r



and accumulation of the integer digits so obtained. Fig.1.2 demonstrates

these procedures.

Integer = 41 Fraction = 0.6875

41 0.6875

20 1 x 2

10 0 1.3750

5 0 x 2

2 1 0.7500

1 0 x 2

0 1 1.5000

X 2

1.0000

(41)10 = (101001)2 (0.6875)10 = (0.1011)2

(41.6875)10 = (101001.1011)2

Fig. 1.2: Conversion of decimal 41.6875 into binary

The conversion of decimal 41.6875 into binary is done by first separating the

number into its integer part 41 and fraction part 0.6875. The integer part is

converted by dividing 41 by r = 2 to give an integer quotient 20 and a

remainder of 1. The quotient is again divided by 2 to give a new quotient

and remainder. This process is repeated until the integer quotient becomes

0. The coefficients of the binary number are obtained from the remainders

with the first remainder giving the low-order bit of the converted binary

number.



The fraction part is converted by multiplying it by r = 2 to give an integer and

a fraction. The new fraction (without the integer) is multiplied again by 2 to

give a new integer and a new fraction. This process is repeated until the

fraction part becomes zero or until the number of digits obtained gives the

required accuracy. The coefficients of the binary fraction are obtained from

the integer digits with the first integer computed being the digit to be placed

next to the binary point. Finally, the two parts are combined to give the total

required conversion.

Octal and Hexadecimal Numbers

The conversion from and to binary, octal, and hexadecimal representation

plays an important part in digital computers. Since 23 = 8 and 24 = 16, each

octal digit corresponds to three binary digits and each hexadecimal digit

corresponds to four binary digits. The conversion from binary to octal is

easily accomplished by partitioning the binary number into groups of three

bits each starting from the least significant bit (LSB) position. The

corresponding octal digit is then assigned to each group of bits and the

string of digits so obtained gives the octal equivalent of the binary number.

Consider, for example, a 16-bit register. Physically, one may think of the

register as composed of 16 binary storage cells, with each cell capable of

holding either a 1 or a 0. Suppose that the bit configuration stored in the

register is as shown in Fig.1.3. Since a binary number consists of a string of

1’s and 0’s, the 16-bit register can be used to store any binary number from

0 to 216 – 1. For the particular example shown, the binary number stored in

the register is the equivalent of decimal 44899. Starting from the low-order

bit, we partition the register into groups of three bits each (the sixteenth bit

remains in a group by itself). Each group of three bits is assigned its octal

equivalent and placed on the top of the register. The string of octal digits so

obtained represents the octal equivalent of the binary number.



Octal 1 2 7 5 4 3

Binary 1 0 1 0 1 1 1 1 0 1 1 0 0 0 1 1

Hexadecimal A F 6 3

Fig. 1.3: Binary, octal and hexadecimal conversion.

Octal number

Binary-coded octal

Decimal equivalent

0 000 0

1 001 1

2 010 2

3 011 3

4 100 4

5 101 5

6 110 6

7 111 7

10 001 000 8

11 001 001 9

12 001 010 10

24 010 100 20

62 110 010 50

143 001 100 011 99

370 011 111 000 248

Table 1.1: Binary-Coded Octal Numbers

Code for one octal digit



Hexadecimal number

Binary-coded hexadecimal

Decimal equivalent

0 0000 0

1 0001 1

2 0010 2

3 0011 3

4 0100 4

5 0101 5

6 0110 6

7 0111 7

8 1000 8

9 1001 9

A 1010 10

B 1011 11

C 1100 12

D 1101 13

E 1110 14

F 1111 15

14 0001 0100 20

32 0011 0010 50

63 0110 0011 99

F8 1111 1000 248

Table 1.2: Binary-Coded Hexadecimal Numbers

Code for

one

hexadecimal

digit



Conversion from binary to hexadecimal is similar except that the bits are

divided into groups of four. The corresponding hexadecimal digit for each

group of four bits is written as shown below the register of Fig.1.3. The

string of hexadecimal digits so obtained represents the hexadecimal

equivalent of the binary number. The corresponding octal digit for each

group of three bits is easily remembered after studying the first eight entries

listed in Table 1.1. The correspondence between a hexadecimal digit and its

equivalent 4-bit code can be found in the first 16 entries of Table 1.2.

Table 1.1 lists a few octal numbers and their representation in registers in

binary-coded form. The binary code is obtained by the procedure explained

above. Each octal digit is assigned a 3-bit code as specified by the entries

of the first eight digits in the table. Similarly, Table 1.2 lists a few

hexadecimal numbers and their representation in registers in binary-coded

form. Here the binary code is obtained by assigning to each hexadecimal

digit the 4-bit code listed in the first 16 entries of the table.

Comparing the binary-coded octal and hexadecimal numbers with their

binary number equivalent we find that the bit combination in all three

representations is exactly the same. For example, decimal 99, when

converted to binary, becomes 1100011. The binary-coded octal equivalent

of decimal 99 is 143 (001 100 011) and the binary-coded hexadecimal of

decimal 99 is 63 (0110 0011). If we neglect the leading zeros in these two

binary representations, we find that their bit combination is identical. This

should be so because of the straight forward conversion that exists between

binary numbers and octal or hexadecimal. The point of all this is that a string

of 1s and 0s stored in a register could represent a binary number, but this

same string of bits may be interpreted as holding an octal number in binary-

coded form (if we divide the bits into groups of three) or as holding a

hexadecimal number in binary-coded form (if we divide the bits into groups

of four).



The registers in a digital computer contain many bits. Specifying the content

of registers by their binary values will require a long string of binary digits. It

is more convenient to specify content of registers by their octal or

hexadecimal equivalent. The number of digits is reduced by one-third in the

octal designation and by one-fourth in the hexadecimal designation. For

example, the binary number 1111 1111 1111 has 12 digits. It can be

expressed in octal as 7777 (four digits) or in hexadecimal as FFF (three

digits). Computer manuals invariably choose either the octal or the

hexadecimal designation for specifying contents of registers.

Decimal Representation

The binary number system is the most natural system for a computer, but

people are accustomed to the decimal system. One way to solve this

conflict is to convert all input decimal numbers into binary numbers, let the

computer perform all arithmetic operations in binary and then convert the

binary results back to decimal for the human user to understand. However,

it is also possible for the computer to perform arithmetic operations directly

with decimal numbers provided they are placed in registers in a coded form.

Decimal numbers enter the computer usually as binary-coded alphanumeric

characters. These codes, introduced later, may contain from six to eight bits

for each decimal digit. When decimal numbers are used for internal

arithmetic computations, they are converted into a binary code with four bits

per digit.

A binary code is a group of n bits that assume up to 2n distinct combination

of 1s and 0s with each combination representing one element of the set that

is being coded. For example, a set of four elements can be coded by a 2-bit

code with each element assigned one of the following bit combinations; 00,

01, 10, or 11. A set of eight elements requires a 3-bit code; a set of 16

elements requires a 4-bit code, and so on. A binary code will have some

unassigned bit combinations if the number of elements in the set is not a

multiple power of 2. The 10 decimal digits from 0 to 9 form such a set. A



binary code that distinguishes among 10 elements must contain at least four

bits, but six combinations will remain unassigned. Numerous different

codes can be obtained by arranging four bits in 10 distinct combinations.

The bit assignment most commonly used for the decimal digits is the

straight binary assignment listed in the first 10 entries of Table 1.3. This

particular code is called Binary Coded Decimal (BCD). Other decimal

codes are sometimes used and a few of them are given in the section 1.7.

It is very important to understand the difference between the conversion of

decimal numbers into binary and the binary coding of decimal numbers. For

example, when converted to a binary number, the decimal number 99 is

represented by the string of bits 1100011, but when represented in BCD, it

becomes 1001 1001. The only difference between a decimal number

represented by the familiar digit symbols 0, 1, 2, …, 9 and the BCD symbols

0001, 0010, …, 1001 is in the symbols used to represent the digits – the

number itself is exactly the same. A few decimal numbers and their

representation in BCD are listed in Table 1.3.

Decimal number Binary-coded decimal (BCD) number

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

10 0001 0000

20 0010 0000

50 0101 0000

99 1001 1001

248 0010 0100 1000

Table 1.3: Binary Coded Decimal (BCD) Numbers

Code for one decimal digit



Alphanumeric Representation

Many applications of digital computers require the handling of data that

consist not only of numbers, but also of the letters of the alphabet and

certain special characters. An alphanumeric character set is a set of

elements that includes the 10 decimal digits, the 26 letters of the alphabet

and a number of special characters, such as $, +, and =. Such a set

contains between 32 and 64 elements (if only uppercase letters are

included) or between 64 and 128 (if both uppercase and lowercase letters

are included). In the first case, the binary code will require six bits and in

the second case, seven bits. The standard alphanumeric binary code is the

American Standard Code for Information Interchange (ASCII), which

uses seven bits to code 128 characters. The binary code for the uppercase

letters, the decimal digits, and a few special characters is listed in Table 1.4.

Note that the decimal digits in ASCII can be converted into BCD by

removing the three high-order bits, 011.

Binary codes play an important part in digital computer operations. The

codes must be in binary because registers can only hold binary information.

One must realize that binary codes merely change the symbols, not the

meaning of the discrete elements they represent. The operations specified

for digital computers must take into consideration the meaning of the bits

stored in registers so that operations are performed on operands of the

same type. In inspecting the bits of a computer register at random, one is

likely to find that it represents some type of coded information rather than a

binary number.

Binary codes can be formulated for any set of discrete elements such as the

musical notes and chess pieces and their positions on the chessboard.

Binary codes are also used to formulate instructions that specify control

information for the computer.



Character Binary code Character Binary code

A 100 0001 0 011 0000

B 100 0010 1 011 0001

C 100 0011 2 011 0010

D 100 0100 3 011 0011

E 100 0101 4 011 0100

F 100 0110 5 011 0101

G 100 0111 6 011 0110

H 100 1000 7 011 0111

I 100 1001 8 011 1000

J 100 1010 9 011 1001

K 100 1011

L 100 1100

M 100 1101 Space 010 0000

N 100 1110 . 010 1110

O 100 1111 ( 010 1000

P 101 0000 + 010 1011

Q 101 0001 $ 010 0100

R 101 0010 * 010 0100

S 101 0011 ) 010 1001

T 101 0100 - 010 1101

U 101 0101 / 010 1111

V 101 0110 , 010 1100

W 101 0111 = 011 1101

X 101 1000

Y 101 1001

Z 101 1010

Table 1.4: American Standard Code for Information Interchange (ASCII)

1.4 Complements

Complements are used in digital computers for simplifying the subtraction

operation and for logical manipulation. There are two types of complements



for each base r system: the r’s complement and the (r-1)’s complement.

When the value of the base r is substituted in the name, the two types are

referred to as the 2’s and 1’s complement for binary numbers and the 10’s

and 9’s complement for decimal numbers.

(r-1)’s Complement

Given a number N in base r having n digits, the (r-1)’s complement of N is

defined as (rn – 1) – N. For decimal numbers r = 10 and r – 1 = 9, so the

9’s complement of N is (10n – 1) – N. Now, 10n represents a number that

consists of a single 1 followed by n 0s. 10n – 1 is a number represented by

n 9s. For example, with n = 4 we have 104 = 10000 and 104 – 1 = 9999. It

follows that the 9’s complement of a decimal number is obtained by

subtracting each digit from 9. For example, the 9’s complement of 546700

is 999999 – 546700 = 453299 and the 9’s complement of 12389 is 99999 –

12389 = 87610.

For binary numbers, r = 2 and r – 1 = 1, so the 1’s complement of N is

(2n – 1) – N. Again, 2n is represented by a binary number that consists of a

1 followed by n 0s. 2n – 1 is a binary number represented by n 1s. For

example, with n = 4, we have 24 = (10000)2 and 24 – 1 = (1111)2. Thus the

1’s complement of a binary number is obtained by subtracting each digit

from 1. However, the subtraction of a binary digit from 1 causes the bit to

change from 0 to 1 or from 1 to 0. Therefore, the 1’s complement of a

binary number is formed by changing 1s into 0s and 0s into 1s. For

example, the 1’s complement of 1011001 is 0100110 and the 1’s

complement of 0001111 is 1110000.

The (r – 1)’s complement of octal or hexadecimal numbers are obtained by

subtracting each digit from 7 or F (decimal 15) respectively.



(r’s) Complement

The r’s complement of an n digit number N in base r is defined as rn – N for

N ≠ 0 and 0 for N = 0. Comparing with the (r – 1)’s complement, we note

that the r’s complement is obtained by adding 1 to the (r – 1)’s complement

since rn – N = [(rn – 1) – N] + 1. Thus the 10’s complement of the decimal

2389 is 7610 + 1 = 7611 and is obtained by adding 1 to the 9’s complement

value. The 2’s complement of binary 101100 is 010011 + 1 = 010100 and is

obtained by adding 1 to the 1’s complement value.

Since 10n is a number represented by a 1 followed by n 0s, then 10n – N,

which is the 10’s complement of N, can be formed also by leaving all least

significant 0s unchanged, subtracting the first non-zero least significant digit

from 10, and then subtracting all higher significant digits from 9. The 10’s

complement of 246700 is 753300 and is obtained by leaving the two zeros

unchanged, subtracting 7 from 10, and subtracting the other three digits

from 9. Similarly, the 2’s complement can be formed by leaving all least

significant 0’s and the first 1 unchanged, and then replacing 1s by 0s and

0s by 1s in all other higher significant bits. The 2’s complement of 1101100

is 0010100 and is obtained by leaving the two low-order 0s and the first

1 unchanged, and then replacing 1s by 0s and 0s by 1s in the other four

most significant bits.

In the definitions above it was assumed that the numbers do not have a

radix point. If the original number N contains a radix point, it should be

removed temporarily to form the r’s or (r-1)’s complement. The radix point

is then restored to the complemented number in the same relative position.

It is also worth mentioning that the complement of the complement restores

the number to its original value. The r’s complement of N is rn – N. The

complement of the complement is rn – (rn – N) = N giving back the original

number.



Subtraction of Unsigned Numbers

The direct method of subtraction taught in elementary schools uses the

borrow concept. In this method we borrow a 1 from a higher significant

position when the minuend digit is smaller than the corresponding

subtrahend digit. This seems to be easiest when people perform

subtraction with paper and pencil. When subtraction is implemented with

digital hardware, this method is found to be less efficient than the method

that uses complements.

The subtraction of two n digit unsigned numbers M – N ( N ≠ 0) in base r

can be done as follows:

1. Add the minuend M to the r’s complement of the subtrahend N.

This performs M + (rn – N) = M – N + rn.

2. If M ≥ N, the sum will produce an end carry rn which is discarded,

and what is left is the result M – N.

3. If M < N, the sum does not produce an end carry and is equal to rn –

(N – M), which is the r’s complement of (N – M). To obtain the

answer in a familiar form, take the r’s complement of the sum and

place a negative sign in front.

Consider, for example, the subtraction 72532 – 13250 = 59282. The 10’s

complement of 13250 is 86750. Therefore:

M = 72532

10’s complement of N = +86750

Sum = 159282

Discard end carry 105 = -100000

Answer = 59282

Now consider an example with M < N. The subtraction 13250 – 72532

produces negative 59282. Using the procedure with complements, we have



M = 13250

10’s complement of N = +27468

Sum = 40718

There is no end carry. Answer is negative 59282 = 10’s complement of

40718.

Since we are dealing with unsigned numbers, there is really no way to get

an unsigned result for the second example. When working with paper and

pencil, we recognize that the answer must be changed to a signed negative

number. When subtracting with complements, the negative answer is

recognized by the absence of the end carry and the complemented result.

Subtraction with complements is done with binary numbers in a similar

manner using the same procedure outlined above. Using the two binary

numbers X = 1010100 and Y = 1000011, we perform the subtraction X – Y

and Y – X using 2’s complements:

X = 1010100

2’s complement of Y = +0111101

Sum = 10010001

Discard end carry 27 = -10000000

Answer: X – Y = 0010001

Y = 1000011

2’s complement of X = +0101100

Sum = 1101111

There is no end carry. Answer is negative 0010001 = 2’s complement of

1101111.



1.5 Fixed-Point Representation

Positive integers, including zero, can be represented as unsigned numbers.

However, to represent negative integers, we need a notation for negative

values. In ordinary arithmetic, a negative number is indicated by a minus

sign and a positive number by a plus sign. Because of hardware limitations,

computers must represent everything with 1s and 0s, including the sign of a

number. As a consequence, it is customary to represent the sign with a bit

placed in the leftmost position of the number. The convention is to make the

sign bit equal to 0 for positive and to 1 for negative.

In addition to the sign, a number may have a binary (or decimal) point.

The position of the binary point is needed to represent fractions, integers, or

mixed integer-fraction numbers. The representation of the binary point in a

register is complicated by the fact that it is characterized by a position in the

register. There are two ways of specifying the position of the binary point in

a register: by giving it a fixed position or by employing a floating-point

representation. The fixed-point method assumes that the binary point is

always fixed in one position. The two positions most widely used are (1) a

binary point in the extreme left of the register to make the stored number a

fraction, and (2) a binary point in the extreme right of the register to make

the stored number an integer. In either case, the binary point is not actually

present, but its presence is assumed from the fact that the number stored in

the register is treated as a fraction or as an integer. The floating-point

representation uses a second register to store a number that designates the

position of the decimal point in the first register. Floating-point

representation is discussed further in the next section.

Integer Representation

When an integer binary number is positive, the sign is represented by 0 and

the magnitude by a positive binary number. When the number is negative,



the sign is represented by 1 but the rest of the number may be represented

in one of three possible ways:

1. Signed-magnitude representation

2. Signed-1’s complement representation

3. Signed-2’s complement representation

The signed-magnitude representation of a negative number consists of the

magnitude and a negative sign. In the other two representations, the

negative number is represented in either the 1’s or 2’s complement of its

positive value. As an example, consider the signed number 14 stored in an

8-bit register. +14 is represented by a sign bit of 0 in the leftmost position

followed by the binary equivalent of 14: 00001110. Note that each of the

eight bits of the register must have a value and therefore 0’s must be

inserted in the most significant positions following the sign bit. Although

there is only one way to represent +14, there are three different ways

to represent -14 with eight bits.

In signed-magnitude representation 1 0001110

In signed-1’s complement representation 1 1110001

In signed-2’s complement representation 1 1110010

The signed-magnitude representation of -14 is obtained from +14 by

complementing only the sign bit. The signed-1’s complement representation

of -14 is obtained by complementing all the bits of +14, including the sign

bit. The signed-2’s complement representation is obtained by taking the 2’s

complement of the positive number, including its sign bit.

The signed-magnitude system is used in ordinary arithmetic but is awkward

when employed in computer arithmetic. Therefore, the signed-complement

is normally used. The 1’s complement imposes difficulties because it has

two representations of 0 (+0 and -0). It is seldom used for arithmetic

operations except in some older computers. The 1’s complement is useful



as a logical operation since the change of 1 to 0 or 0 to 1 is equivalent to a

logical complement operation. The following discussion of signed binary

arithmetic deals exclusively with the signed-2’s complement representation

of negative numbers.

Arithmetic Addition

The addition of two numbers in the signed-magnitude system follows the

rules of ordinary arithmetic. If the signs are the same, we add the two

magnitudes and give the sum the common sign. If the signs are different,

we subtract the smaller magnitude from the larger and give the result the

sign of the larger magnitude. For example, (+25) + (-37) = -(37 – 25) = -12

and is done by subtracting the smaller magnitude 25 from the larger

magnitude 37 and using the sign of 37 for the sign of the result. This is a

process that requires the comparison of the signs and the magnitudes and

then performing either addition or subtraction.

By contrast, the rule for adding numbers in the signed-2’s complement

system does not require a comparison or subtraction, only addition and

complementation. The procedure is very simple and can be stated as

follows: Add the two numbers, including their sign bits, and discard

any carry out of the sign (leftmost) bit position. Numerical examples for

addition are shown below. Note that negative numbers must initially be

in 2’s complement and that if the sum obtained after the addition is

negative, it is in 2’s complement form.



+6 00000110 -6 11111010

+13 00001101 +13 00001101

----- ------------- ----- -------------

+19 00010011 +7 00000111

+6 00000110 -6 11111010

-13 11110011 -13 11110011

---- ------------- ----- -------------

-7 11111001 -19 11101101

In each of the four cases, the operation performed is always addition,

including the sign bits. Any carry out of the sign bit position is discarded,

and negative results are automatically in 2’s complement form.

The complement form of representing negative numbers is unfamiliar to

people used to the signed-magnitude system. To determine the value of a

negative number when in signed-2’s complement, it is necessary to convert

it to a positive number to place it in a more familiar form. For example, the

signed binary number 11111001 is negative because the leftmost bit is 1.

Its 2’s complement is 00000111, which is the binary equivalent of +7. We

therefore recognize the original negative number to be equal to -7.

Arithmetic Subtraction

Subtraction of two signed binary numbers when negative numbers are in 2’s

complement form is very simple and can be stated as follows: Take the 2’s

complement of the subtrahend (including the sign bit) and add it to the

minuend (including the sign bit). A carry out of the sign bit position is

discarded.

This procedure stems from the fact that a subtraction operation can be

changed to an addition operation if the sign of the subtrahend is changed.

This is demonstrated by the following relationship:



(±A) – (+B) = (±A) + (-B)

(±A) – (-B) = (±A) + (+B)

But changing a positive number to a negative number is easily done by

taking its 2’s complement. The reverse is also true because the complement

of a negative number in complement form produces the equivalent positive

number. Consider the subtraction of (-6) – (-13) = +7. In binary with eight

bits this is written as 11111010 – 11110011. The subtraction is changed to

addition by taking the 2’s complement of the subtrahend (-13) to give (+13).

In binary this is 11111010 + 00001101 = 100000111. Removing the end

carry, we obtain the correct answer 00000111 (+7).

It is worth noting that binary numbers in the signed-2’s complement system

are added and subtracted by the same basic addition and subtraction rules

as unsigned numbers. Therefore, computers need only one common

hardware circuit to handle both types of arithmetic. The user or programmer

must interpret the results of such addition or subtraction differently

depending on whether it is assumed that the numbers are signed or

unsigned.

Overflow

When two numbers of n digits each are added and the sum occupies n+1

digits, we say that an overflow has occurred. When the addition is

performed with paper and pencil, an overflow is not a problem since there is

no limit to the width of the page to write down the sum. An overflow is a

problem in digital computers because the width of registers is finite. A result

that contains n+1 bits cannot be accommodated in a register with a

standard length of n bits. For this reason, many computers detect the

occurrence of an overflow, and when it occurs, a corresponding flip-flop is

set which can then be checked by the user.



The detection of an overflow after the addition of two binary numbers

depends on whether the numbers are considered to be signed or unsigned.

When two unsigned numbers are added, an overflow is detected from the

end carry out of the most significant position. In the case of singed numbers,

the leftmost bit always represents the sign, and negative numbers are in 2’s

complement form. When two signed numbers are added, the sign bit is

treated as part of the number and the end carry does not indicate an

overflow.

An overflow cannot occur after an addition if one number is positive and the

other is negative, since adding a positive number to a negative number

produces a result that is smaller than the larger of the two original numbers.

An overflow may occur if the two numbers added are both positive or both

negative. To see how this can happen, consider the following example. Two

signed binary numbers, +70 and +80, are stored in two 8-bit registers. The

range of numbers that each register can accommodate is from binary +127

to binary -128. Since the sum of the two numbers is +150, it exceeds the

capacity of the 8-bit register. This is true if the numbers are both positive or

both negative. The two additions in binary are shown below together with

the last two carries.

carries: 0 1 carries: 1 0

+70 0 1000110 -70 1 0111010

+80 0 1010000 -80 1 0110000

------- ---------------- ------- ----------------

+150 1 0010110 -150 0 1101010

Note that the 8-bit result that should have been positive has a negative sign

bit and the 8-bit result that should have been negative has a positive sign

bit. If, however, the carry out of the sign bit position is taken as the sign bit

of the result, the 9-bit answer so obtained will be correct. Since the answer

cannot be accommodated within 8 bits, we say that an overflow occurred.



An overflow condition can be detected by observing the carry into the sign

bit position and the carry out of the sign bit position. If these two carries are

not equal, an overflow condition is produced. This is indicated in the

examples where the two carries are explicitly shown. If the two carries are

applied to an exclusive-OR gate, an overflow will be detected when the

output of the gate is equal to 1.

Decimal Fixed-Point Representation

The representation of decimal numbers in registers is a function of the

binary code used to represent a decimal digit. A 4-bit decimal code requires

four flip-flops for each decimal digit. The representation of 4385 in BCD

requires 16 flip-flops, four flip-flops for each digit. The number will be

represented in a register with 16 flip-flops as follows:

0100 0011 1000 0101

By representing numbers in decimal we are wasting a considerable amount

of storage space since the number of bits needed to store a decimal number

in a binary code is greater than the number of bits needed for its equivalent

binary representation. Also, the circuits required to perform decimal

arithmetic are more complex. However, there are some advantages in the

use of decimal representation because computer input and output data are

generated by people who use the decimal system. Some applications, such

as business data processing, require small amounts of arithmetic

computations compared to the amount required for input and output of

decimal data. For this reason, some computers and all electronic calculators

perform arithmetic operations directly with the decimal data (in a binary

code) and thus eliminate the need for conversion into binary and back to

decimal. Some computer systems have hardware for arithmetic calculations

with both binary and decimal data.



The representation of signed decimal numbers in BCD is similar to the

representation of signed numbers in binary. We can either use the familiar

signed-magnitude system or the signed-complement system. The sign of a

decimal number is usually represented with four bits to conform to the

4-bit code of the decimal digits. It is customary to designate a plus with four

0’s and a minus with the BCD equivalent of 9, which is 1001.

The signed-magnitude system is difficult to use with computers. The

signed-complement system can be either the 9’s or the 10’s complement,

but the 10’s complement is the one most often used. To obtain the 10’s

complement of a BCD number, we first take the 9’s complement and then

add one to the least significant digit. The 9’s complement is calculated from

the subtraction of each digit from 9.

The procedures developed for the signed-2’s complement system apply also

to the signed-10’s complement system for decimal numbers. Addition is

done by adding all digits, including the sign digit, and discarding the end

carry. Obviously, this assumes that all negative numbers are in 10’s

complement form. Consider the addition (+375) + (-240) = +135 done in the

signed-10’s complement system.

0 375 (0000 0011 0111 0101)BCD

+9 760 (1001 0111 0110 0000)BCD

-------- -----------------------------------

0 135 (0000 0001 0011 0101)BCD

The 9 in the leftmost position of the second number indicates that the

number is negative. 9760 is the 10’s complement of 0240. The two numbers

are added and the end carry is discarded to obtain +135. Of course, the

decimal numbers inside the computer must be in BCD, including the sign

digits. The addition is done with BCD adders.



The subtraction of decimal numbers either unsigned or in the signed-10’s

complement system is the same as in the binary case. Take the 10’s

complement of the subtrahend and add it to the minuend. Many computers

have special hardware to perform arithmetic calculations directly with

decimal numbers in BCD. The user of the computer can specify by

programmed instructions that the arithmetic operations be performed with

decimal numbers directly without having to convert them to binary.

1.6 Floating-Point Representation

The floating-point representation of a number has two parts. The first part

represents a signed, fixed-point number called the mantissa. The second

part designates the position of the decimal (or binary) point and is called the

exponent. The fixed-point mantissa may be a fraction or an integer. For

example, the decimal number +6132.789 is represented in floating-point

with a fraction and an exponent as follows:

Fraction Exponent

+0.6132789 +04

The value of the exponent indicates that the actual position of the decimal

point is four positions to the right of the indicated decimal point in the

fraction. This representation is equivalent to the scientific notation

+0.6132789 x 10+4.

Floating-point is always interpreted to represent a number in the following

form:

m x re

Only the mantissa m and the exponent e are physically represented in the

register (including their signs). The radix r and the radix-point position of the

mantissa are always assumed. The circuits that manipulate the floating-

point numbers in registers conform with these two assumptions in order to

provide the correct computational results.



A floating-point binary number is represented in a similar manner except

that it uses base 2 for the exponent. For example, the binary number

+1001.11 is represented with an 8-bit fraction and 6-bit exponent as follows:

Fraction Exponent

01001110 000100

The fraction has a 0 in the leftmost position to denote positive. The binary

point of the fraction follows the sign bit but is not shown in the register. The

exponent has the equivalent binary number +4. The floating-point number is

equivalent to

m x 2e = +(.1001110)2 x 2+4

A floating-point number is said to be normalized if the most significant digit

of the mantissa is nonzero. For example, the decimal number 350 is

normalized but 00035 is not. Regardless of where the position of the radix

point is assumed to be in the mantissa, the number is normalized only if its

leftmost digit is nonzero. For example, the 8-bit binary number 00011010 is

not normalized because of the three leading 0s. The number can be

normalized by shifting it three positions to the left and discarding the leading

0s to obtain 11010000. The three shifts multiply the number by 23 = 8. To

keep the same value for the floating-point number, the exponent must be

subtracted by 3. Normalized numbers provide the maximum possible

precision for the floating-point number. A zero cannot be normalized

because it does not have a nonzero digit. It is usually represented in

floating-point by all 0s in the mantissa and exponent.

Arithmetic operations with floating-point numbers are more complicated than

arithmetic operations with fixed-point numbers and their execution takes

longer and requires more complex hardware. However, floating-point

representation is a must for scientific computations because of the scaling

problems involved with fixed-point computations. Many computers and all



electronic calculators have the built-in capability of performing floating-point

arithmetic operations. Computers that do not have hardware for floating-

point computations have a set of subroutines to help the user program

scientific problems with floating-point numbers.

1.7 Other Binary Codes

In previous sections we introduced the most common types of binary-coded

data found in digital computers. Other binary codes for decimal numbers

and alphanumeric characters are sometimes used. Digital computers also

employ other binary codes for special applications. A few additional binary

codes encountered in digital computers are presented in this section.

Gray Code

Digital systems can process data in discrete form only. Many physical

systems supply continuous output data. The data must be converted into

digital form before they can be used by a digital computer. Continuous, or

analog, information is converted into digital form by means of an analog-to-

digital converter. The reflected binary or Gray code, shown in Table 1.5, is

sometimes used for the converted digital data. The advantage of the Gray

code over straight binary numbers is that the Gray code changes by only

one bit as it sequences from one number to the next. In other words, the

change from any number to the next in sequence is recognized by a change

of only one bit from 0 to 1 or from 1 to 0. A typical application of the Gray

code occurs when the analog data are represented by the continuous

change of a shaft position. The shaft is partitioned into segments with each

segment assigned a number. If adjacent segments are made to correspond

to adjacent Gray code numbers, ambiguity is reduced when the shaft

position is in the line that separates any two segments.



Binary code

Decimal equivalent

Binary

code

Decimal equivalent

0000 0 1100 8

0001 1 1101 9

0011 2 1111 10

0010 3 1110 11

0110 4 1010 12

0111 5 1011 13

0101 6 1001 14

0100 7 1000 15

Table 1.5: 4-Bit Gray Code

Gray codes counters are sometimes used to provide the timing sequence

that control the operations in a digital system. A Gray code counter is a

counter whose flip-flops go through a sequence of states as specified in

Table 1.5. Gray code counters remove the ambiguity during the change

from one state of the counter to the next because only one bit can change

during the state transition.

Other Decimal Codes

Binary codes for decimal digits require a minimum of four bits. Numerous

different codes can be formulated by arranging four or more bits in 10

distinct possible combinations. A few possibilities are shown in Table 1.6.

Decimal digit BCD 8421 2421 Excess-3 Excess-3 gray

0 0000 0000 0011 0010

1 0001 0001 0100 0110

2 0010 0010 0101 0111

3 0011 0011 0110 0101

4 0100 0100 0111 0100

5 0101 1011 1000 1100

6 0110 1100 1001 1101

7 0111 1101 1010 1111



8 1000 1110 1011 1110

9 1001 1111 1100 1010

Unused

bit

combinations

1010 0101 0000 0000

1011 0110 0001 0001

1100 0111 0010 0011

1101 1000 1101 1000

1110 1001 1110 1001

1111 1010 1111 1011

Table 1.6: Four Different Binary Codes for the Decimal Digit

The BCD (binary-coded decimal) has been introduced before. It uses a

straight assignment of the binary equivalent of the digit. The six unused bit

combinations listed have no meaning when BCD is used, just as the letter H

has no meaning when decimal digit symbols are written down. For example,

saying that 1001 110 is a decimal number in BCD is like saying that 9H is a

decimal number in the conventional symbol designation. Both cases contain

an invalid symbol and therefore designate a meaningless number.

One disadvantage of using BCD is the difficulty encountered when the 9’s

complement of the number is to be computed. On the other hand, the 9’s

complement is easily obtained with the 2421 and the excess-3 codes listed

in Table 1.6. These two codes have a self-complementing property which

means that the 9’s complement of a decimal number, when represented in

one of these codes, is easily obtained by changing 1’s to 0’s and 0’s to 1’s.

The property is useful when arithmetic operations are done in signed-

complement representation.

The 2421 is an example of a weighted code. In a weighted code, the bits

are multiplied by the weights indicated and the sum of the weighted bits

gives the decimal digit. For example, the bit combination 1101,

when weighted by the respective digits 2421, gives the decimal equivalent



of 2 x 1 + 4 x 1 + 2 x 0 + 1 + 1 = 7. The BCD code can be assigned the

weights 8421 and for this reason it is sometimes called the 8421 code.

The excess-3 code is a decimal code that has been used in older

computers. This is an unweighted code. Its binary code assignment is

obtained from the corresponding BCD equivalent binary number after the

addition of binary 3 (0011).

From Table 1.5 we note that the Gray code is not suited for a decimal code

if we were to choose the first 10 entries in the table. This is because the

transition from 9 back to 0 involves a change of three bits (from 1101 to

0000). To overcome this difficulty, we choose the 10 numbers starting from

the third entry 0010 up to the twelfth entry 1010. Now the transition from

1010 to 0010 involves a change of only one bit. Since the code has been

shifted up three numbers, it is called the excess-3 Gray. This code is listed

with the other decimal codes in Table 1.6.

Other Alphanumeric Codes

The ASCII code (Table 1.4) is the standard code commonly used for the

transmission of binary information. Each character is represented by a 7-bit

code and usually an eighth bit is inserted for parity. The code consists of

128 characters. Ninety-five characters represent graphic symbols that

include upper- and lowercase letters, numerals zero to nine, punctuation

marks, and special symbols. Twenty-three characters represent format

effectors, which are functional characters for controlling the layout of printing

or display devices such as carriage return, line feed, horizontal tabulation,

and back space. The other 10 characters are used to direct the data

communication flow and report its status.

Another alphanumeric (sometimes called alphameric) code used in IBM

equipment is the EBCDIC (Extended BCD Interchange Code). It uses

eight bits for each character (and a ninth bit for parity). EBCDIC has the



same character symbols as ASCII but the bit assignment to characters is

different.

When alphanumeric characters are used internally in a computer for data

processing (not for transmission purpose) it is more convenient to use a 6-

bit code to represent 64-characters. A 6-bit code can specify the 26

uppercase letters of the alphabet, numerals zero to nine, and up to 28

special characters. This set of characters is usually sufficient for data-

processing purposes. Using fewer bits to code characters has the

advantage of reducing the memory space needed to store large quantities of

alphanumeric data.

1.8 Summary

This unit explains clearly various units of a digital computer and different

data formats used in computers. Also we have learnt the number

representation in digital computers and different types of codes used in

computers and communication.

Self Assessment Questions

1. The word digital implies that the information in the computer is

represented by variables that take a limited number of ____________

values.

2. The _________ of the computer consists of all the electronic

components and electromechanical devices that comprise the physical

entity of the device.

3. The ______________ of a computer consists of a collection of

programs whose purpose is to make more effective use of the

computer.



4. The ____________ contains electronic circuits for communicating and

controlling the transfer of information between the computer and the

outside world.

5. ASCII stands for _________________________.


1. Convert the following binary numbers to decimal: 101110; 1110101; and

110110100.

2. Convert the following decimal numbers to binary: 1231; 673; and 1998.

3. Convert the hexadecimal number F3A7C2 to binary and octal.

4. Explain different types of binary codes.

1.10 Answers

Self Assessment Questions:

1. discrete

2. hardware

3. system software

4. input-output processor (IOP)

5. American Standard Code for Information Interchange

Terminal Questions:

1. Refer Section 1.3






Unit 2 Register Transfer and Microoperations

Structure:

2.1 Introduction

Objectives

2.2 Register Transfer Language

2.3 Register Transfer

2.4 Bus and Memory Transfers

2.5 Arithmetic Microoperations

2.6 Logic Microoperations

2.7 Shift Microoperations

2.8 Arithmetic Logic Shift Unit

2.9 Summary


2.11 Answers

2.1 Introduction

A digital system is an interconnection of digital hardware modules that

accomplish a specific information-processing task. Digital systems vary in

size and complexity, from a few integrated circuits to a complex of

interconnected and interacting digital computers. Digital system design

invariably uses a modular approach. The modules are constructed from

such digital components as registers, decoders, arithmetic elements, and

control logic. The various modules are interconnected with common data

and control paths to form a digital computer system.

Objectives:

After studying this unit, the learner will be able to

Explain Register Transfer Language

Understand bus selection techniques



Explain arithmetic and logical circuits

Understand the operation of arithmetic logic shift unit

2.2 Register Transfer Language

Digital modules are best defined by the registers they contain and the

operations that are performed on the data stored in them. The operations

executed on data stored in registers are called microoperations. A

microoperation is an elementary operation performed on the information

stored in one or more registers. The result of the operation may replace the

previous binary information of a register or may be transferred to another

register. Examples of microoperations are shift, count, clear, and load.

The internal hardware organization of a digital computer is best defined by

specifying:

1. The set of registers it contains and their function.

2. The sequence of microoperations performed on the binary

information stored in the registers.

3. The control that initiates the sequence of microoperations.

It is possible to specify the sequence of microoperations in a computer by

explaining every operation in words, but this procedure usually involves a

lengthy descriptive explanation. It is more convenient to adopt a suitable

symbology to describe the sequence of transfers between registers and the

various arithmetic and logic microoperations associated with the transform.

The use of symbols instead of a narrative explanation provides an organized

and concise manner for listing the microoperation sequences in registers

and the control functions that initiate them.

The symbolic notation used to describe the microoperation transfers among

registers is called a register transfer language. The term “register transfer”

implies the availability of hardware logic circuits that can perform a stated



microoperation and transfer the result of the operation to the same or

another register. The word “language” is borrowed from programmers, who

apply this term to programming languages. A programming language is a

procedure for writing symbols to specify a given computational process.

Similarly, a natural language such as English is a system for writing symbols

and combining them into words and sentences for the purpose of

communication between people. A register transfer language is a system for

expressing in symbolic form the microoperation sequences among the

registers of a digital module. It is a convenient tool for describing the internal

organization of digital computers in concise and precise manner. It can also

be used to facilitate the design process of digital systems.

The register transfer language adopted here is believed to be as simple as

possible, so it should not take very long to memorize. We will proceed to

define symbols for various types of microoperations, and at the same time,

describe associated hardware that can implement the stated

microoperations. The symbolic designation introduced in this chapter will be

utilized in subsequent chapters to specify the register transfers, the

microoperations, and the control functions that describe the internal

hardware organization of digital computers.

2.3 Register Transfer

Computer registers are designated by capital letters (sometimes followed by

numerals) to denote the function of the register. For example, the register

that holds an address for the memory unit is usually called a memory

address register and is designated by the name MAR. Other designations

for registers are PC (for program counter), IR (for instruction register),

and R1 (for processor register). The individual flip-flops in an n-bit register

are numbers in sequence from 0 through n-1, starting from 0 in the



rightmost position and increasing the numbers toward the left. Fig.2.1

shows the representation of registers in block diagram form. The most

common way to represent a register is by a rectangular box with the name

of the register inside, as in Fig.2.1 (a). The individual bits can be

distinguished as in (b). The numbering of bits in a 16-bit register can be

marked on top of the box as shown in (c). A 16-bit register is partitioned into

two parts in (d). Bits 0 through 7 are assigned the symbol L (for low byte)

and bits 8 through 15 are assigned the symbol H (for high byte). The name

of the 16-bit register is PC. The symbol PC (0-7) or PC (L) refers to the

low-order byte and PC (8-15) or PC (H) to the high-order byte.

Fig. 2.1: Block diagram of register

Information transfer from one register to another is designated in symbolic

form by means of a replacement operator. The statement

R2 R1

denotes a transfer of the content of register R1 into register R2. It

designates a replacement of the content of R2 by the content of R1. By

definition, the content of the source register R1 does not change after the

transfer.

R1 7 6 5 4 3 2 1 0

R2 PC (H) PC (L)

a) Register R b) Showing individual bits

c) Numbering of bits d) Divided into two parts

15 0 15 8 7 0



A statement that specifies a register transfer implies that circuits are

available from the outputs of the source register to the inputs of the

destination register and that the designation register has a parallel load

capability. Normally, we want the transfer to occur only under a

predetermined control condition. This can be shown by means of an if-then

statement.

If (P = 1) then (R2 R1)

where P is a control signal generated in the control section. It is sometimes

convenient to separate the control variables from the register transfer

operation by specifying a control function. A control function is a Boolean

variable that is equal to 1 or 0. The control function is included in the

statement as follows:

P: R2 R1

The control condition is terminated with a colon. It symbolizes the

requirement that the transfer operation be executed by the hardware only if

P = 1.

Every statement written in a register transfer notation implies a hardware

construction for implementing the transfer. Fig.2.2 shows the block diagram

that depicts the transfer from R1 to R2. The n outputs of register R1 are

connected to the n inputs of register R2. The letter n will be used to indicate

any number of bits for the register. It will be replaced by an actual number

when the length of the register is known. Register R2 has a load input that

is activated by the control variable P. It is assumed that the control variable

is synchronized with the same clock as the one applied to the register. As

shown in the timing diagram (b), P is activated in the control section by the

rising edge of a clock pulse at time t. The next positive transition of the

clock at time t + 1 finds the load input active and the data inputs of R2 are

then loaded into the register in parallel. P may go back to 0 at time t + 1;



otherwise, the transfer will occur with every clock pulse transition while P

remains active.

Fig. 2.2: Transfer from R1 to R2 when P = 1

Note that the clock is not included as a variable in the register transfer

statements. It is assumed that all transfers occur during a clock edge

transition. Even though the control condition such as P becomes active just

after time t, the actual transfer does not occur until the register is triggered

by the next positive transition of the clock at time t + 1.

The basic symbols of the register transfer notation are listed in Table 2.1.

Registers are denoted by capital letters, and numerals may follow the

letters. Parentheses are used to denote a part of a register by specifying the

range of bits or by giving a symbol name to a portion of a register. The

arrow denotes a transfer of information and the direction of transfer. A

comma is used to separate two or more operations that are executed at the

same time. The statement

Control circuit R2

R1

P Load Clock

n

a) Block diagram

b) Timing diagram

Clock

Load

Transfer occurs here

r r + 1



T: R2 R1, R1 R2

denotes an operation that exchanges the contents of two registers during

one common clock pulse provided that T = 1. This simultaneous operation is

possible with registers that have edge-triggered flip-flops.

Symbol Description Examples

Letters

(and numerals)

Denotes a register MAR, R2

Parentheses ( ) Denotes a part of a register R2(0-7), R2(L)

Arrow Denotes transfer of information R2 R1

Comma, Separates two microoperations R2 R1, R1 R2

Table 2.1: Basic Symbols for Register Transfers

2.4 Bus and Memory Transfers

A typical digital computer has many registers, and paths must be provided

to transfer information from one register to another. The number of wires will

be excessive if separate lines are used between each register and all other

registers in the system. A more efficient scheme for transferring information

between registers in a multiple-register configuration is a common bus

system. A bus structure consists of a set of common lines, one for each bit

of a register, through which binary information is transferred one at a time.

Control signals determine which register is selected by the bus during each

particular register transfer.

One way of constructing a common bus system is with multiplexers. The

multiplexers select the source register whose binary information is then

placed on the bus. The construction of a bus system for four registers is

shown in Fig.2.3. Each register has four bits, numbered 0 through 3. The

bus consists of four 4 x 1 multiplexers each having four data inputs, 0

through 3, and two selection inputs, S1 and S0. In order not to complicate

the diagram with 16 lines crossing each other, we use labels to show the



connections from the outputs of the registers to the inputs of the

multiplexers. For example, output 1 of register A is connected to input 0 of

MUX 1 because this input is labeled A1. The diagram shows that the bits in

the same significant position in each register are connected to the data

inputs of one multiplexer to form one line of the bus. Thus MUX 0

multiplexes the four 0 bits of the registers, MUX 1 multiplexes the four 1 bits

of the registers, and similarly for the other two bits.

Fig. 2.3: Bus system for four registers

The two selection lines S1 and S0 are connected to the selection inputs of

all four multiplexers. The selection lines choose the four bits of one register

and transfer them into the four-line common bus. When S1S0 = 00, the 0

data inputs of all four multiplexers are selected and applied to the outputs

that form the bus. This causes the bus lines to receive the content of



register A since the outputs of this register are connected to the 0 data

inputs of the multiplexers. Similarly, register B is selected if S1S0 = 01, and

so on. Table 2.2 shows the register that is selected by the bus for each of

the four possible binary values of the selection lines.

1 S0 Register selected

0 0 A

0 1 B

1 0 C

1 1 D

Table 2.2: Function Table for Bus of Fig. 2.3

In general, a bus system will multiplex k registers of n bits each to produce

an n-line common bus. The number of multiplexers needed to construct the

bus is equal to n, the number of bits in each register. The size of each

multiplexer must be k x 1 since it multiplexes k data lines. For example, a

common bus for eight registers of 16 bits each requires 16 multiplexers, one

for each line in the bus. Each multiplexer must have eight data input lines

and three selection lines to multiplex one significant bit in the eight registers.

The transfer of information from a bus into one of many destination registers

can be accomplished by connecting the bus lines to the inputs of all

destination registers and activating the load control of the particular

destination register selected. The symbolic statement for a bus transfer may

mention the bus or its presence may be implied in the statement. When the

bus is included in the statement, the register transfer is symbolized as

follows:

BUS C, R1 BUS

The content of register C is placed on the bus, and the content of the bus is

loaded into register R1 by activating its load control input. If the bus is



known to exist in the system, it may be convenient just to show the direct

transfer.

R1 C

From this statement the designer knows which control signals must be

activated to produce the transfer through the bus.

Three-State Bus Buffers

A bus system can be constructed with three-state gates instead of

multiplexers. A three-state gate is a digital circuit that exhibits three states.

Two of the states are signals equivalent to logic 1 and 0 as in a conventional

gate. The third state is a high-impedance state. The high-impedance state

behaves like an open circuit, which means that the output is disconnected

and does not have a logical significance. Three-state gates may perform

any conventional logic, such as AND or NAND. However, the one most

commonly used in the design of a bus system is the buffer gate.

The graphic symbol of a three-state buffer gate is shown in Fig.2.4. It is

distinguished from a normal buffer by having both a normal input and a

control input. The control input determines the output state. When the

control input is equal to 1, the output is enabled and the gate behaves like

any conventional buffer, with the output equal to the normal input. When the

control input is 0, the output is disabled and the gate goes to a high-

impedance state, regardless of the value in the normal input. The high-

impedance state of a three-state gate provides a special feature not

available in other gates. Because of this feature, a large number of three-

state gate outputs can be connected with wires to form a common bus line

without endangering loading effects.



Fig. 2.4: Graphic symbols for three-state buffer

The construction of a bus system with three-state buffers is demonstrated

in Fig.2.5. The outputs of four buffers are connected together to form a

single bus line. (It must be realized that this type of connection cannot be

done with gates that do not have three-state outputs). The control inputs to

the buffers determine which of the four normal inputs will communicate with

the bus line. No more than one buffer may be in the active state at any given

time. The connected buffers must be controlled so that only one three-state

buffer has access to the bus line while all other buffers are maintained in a

high-impedance state.

Fig. 2.5: Bus line with three state-buffers

One way to ensure that no more than one control input is active at any given

time is to use a decoder, as shown in the diagram. When the enable input

Normal input A

Control input C

Output Y=A if C=1 High-impedance if C=0



of the decoder is 0, all of its four outputs are 0, and the bus line is in a high-

impedance state because all four buffers are disabled. When the enable

input is active, one of the three-state buffers will be active, depending on the

binary value in the select inputs of the decoder. Careful investigation will

reveal that Fig.2.5 is another way of constructing a 4 x 1 multiplexer since

the circuit can replace the multiplexer in Fig.2.3.

To construct a common bus for four registers of n bits each using three-

state buffers, we need n circuits with four buffers in each as shown in

Fig.2.5. Each group of four buffers receives one significant bit from the four

registers. Each common output produces one of the lines for the common

bus for a total of n lines. Only one decoder is necessary to select between

the four registers.

Memory Transfer

The transfer of information from a memory word to the outside environment

is called a read operation. The transfer of new information to be stored into

the memory is called a write operation. A memory word will be symbolized

by the letter M. The particular memory word among the many available is

selected by the memory address during the transfer. It is necessary to

specify the address of M when writing memory transfer operations. This will

be done by enclosing the address in square brackets following the letter M.

Consider a memory unit that receives the address from a register, called the

address register, symbolized by AR. The data are transferred to another

register, called the data register, symbolized by DR. The read operation can

be stated as follows:

Read: DR M[AR]

This causes a transfer of information into DR from the memory word M

selected by the address in AR.



The write operation transfers the content of a data register to a memory

word M selected by the address. Assume that the input data are in register

R1 and the address is in AR. The write operation can be stated symbolically

as follows:

Write: M[AR] R1

This causes a transfer of information from R1 into the memory word M

selected by the address in AR.

2.5 Arithmetic Microoperations

A microoperation is an elementary operation performed with the data

stored in registers. The microoperations most often encountered in digital

computers are classified into four categories:

1. Register transfer microoperations transfer binary information from

one register to another.

2. Arithmetic microoperations perform arithmetic operations on numeric

data stored in registers.

3. Logic microoperations perform bit manipulation operations on non-

numeric data stored in registers.

4. Shift microoperations perform shift operations on data stored in

registers.

The register transfer microoperation was introduced in Sec.2.3. This type of

microoperation does not change the information content when the binary

information moves from the source register to the destination register. The

other three types of microoperations change the information content during

the transfer. In this section we introduce a set of arithmetic microoperations.

In the next two sections we present the logic and shift microoperations.

The basic arithmetic microoperations are addition, subtraction, increment,

decrement, and shift. Arithmetic shifts are explained later in conjunction with



the shift microoperations. The arithmetic microoperation defined by the

statement

R3 R1 + R2

specifies an add microoperation. It states that the contents of register R1

are added to the contents of register R2 and the sum transferred to register

R3. To implement this statement with hardware we need three registers and

the digital component that performs the addition operation. The other basic

arithmetic microoperations are listed in Table 2.3. Subtraction is most often

implemented through complementation and addition. Instead of using the

minus operator, we can specify the subtraction by the following statement:

R3 R1 + R2 + 1

2R is the symbol for the 1’s complement of R2. Adding 1 to the 1’s

complement produces the 2’s complement. Adding the contents of R1 to

the 2’s complement of R2 is equivalent to R1 – R2.

Symbolic designation Description

R3 R1 + R2 Contents of R1 plus R2 transferred to R3

R3 R1 – R2 Contents of R1 minus R2 transferred to R3

R2 2R Complement the contents of R2 (1’s complement)

R2 2R + 1 2’s complement the contents of R2 (negate)

R3 R1 + 2R + 1 R1 plus the 2’s complement of R2 (subtraction)

R1 R1 + 1 Increment the contents of R1 by one

R1 R1 – 1 Decrement the contents of R1 by one

Table 2.3: Arithmetic Microoperations

The increment and decrement microoperations are symbolized by plus-one

and minus-one operations, respectively. These microoperations are

implemented with a combinational circuit or with a binary up-down counter.



The arithmetic operations of multiply and divide are not listed in Table 2.3.

These two operations are valid arithmetic operations but are not included in

the basic set of microoperations. The only place where these operations can

be considered as microoperations is in a digital system, where they are

implemented by means of a combinational circuit. In such a case, the

signals that perform these operations propagate through gates, and the

result of the operation can be transferred into a destination register by a

clock pulse as soon as the output signal propagates through the

combinational circuit. In most computers, the multiplication operation is

implemented with a sequence of add and shift microoperations. Division is

implemented with a sequence of subtract and shift microoperations. To

specify the hardware in such a case requires a list of statements that use

the basic microoperations of add, subtract and shift.

Binary Adder

To implement the add microoperation with hardware, we need the registers

that hold the data and the digital component that performs the arithmetic

addition. The digital circuit that forms the arithmetic sum of two bits and a

previous carry is called a full-adder. The digital circuit that generates the

arithmetic sum of two binary numbers of any lengths is called a binary

adder. The binary adder is constructed with full-adder circuits connected in

cascade, with the output carry from one full-adder connected to the input

carry of the next full-adder. Fig.2.6 shows the interconnections of four full-

adders (FA) to provide a 4-bit binary adder. The augend bits of A and the

addend bits of B are designated by subscript numbers from right to left, with

subscript 0 denoting the low-order bit. The carries are connected in a chain

through the full-adders. The input carry to the binary adder is C0 and the

output carry is C4. The S outputs of the full-adders generate the required

sum bits.



Fig. 2.6: 4-bit binary adder

An n-bit binary adder requires n full-adders. The output carry from each full-

adder is connected to the input carry of the next-high-order full-adder. The n

data bits for the A inputs come from one register (such as R1), and the n

data bits for the B inputs come from another register (such as R2). The sum

can be transferred to a third register or to one of the source registers (R1 or

R2), replacing its previous content.

Binary Adder-Subtractor

The subtraction of binary numbers can be done most conveniently by

means of complements. Remember that the subtraction, A – B can be done

by taking the 2’s complement of B and adding it to A. The 2’s complement

can be obtained by taking the 1’s complement and adding one to the least

significant pair of bits. The 1’s complement can be implemented with

inverters and a one can be added to the sum through the input carry.

The addition and subtraction operations can be combined into one common

circuit by including an exclusive-OR gate with each full-adder. A 4-bit adder-

subtractor circuit is shown in Fig.2.7. The mode input M controls the

operation. When M = 0 the circuit is an adder and when M = 1 the circuit

becomes a subtractor. Each exclusive-OR gate receives input M and one of

the inputs of B. When M = 0, we have B 0 = B. The full-adders receive

the value of B, the input carry is 0, and the circuit performs A plus B. When

M = 1, we have B 1 = B’ and C0 = 1. The B inputs are all complemented

and a 1 is added through the input carry. The circuit performs the operation

A plus the 2’s complement of B. For unsigned numbers, this gives A – B if



A ≥ B or the 2’s complement of (B – A) if A < B. For signed numbers, the

result is A – B provided that there is no overflow.

Fig. 2.7: 4-bit adder-subtractor

Binary Incrementer

The increment microoperation adds one to a number in a register. For

example, if a 4-bit register has a binary value 0110, it will go to 0111 after it

is incremented. This microoperation is easily implemented with a binary

counter. Every time the count enable is active, the clock pulse transition

increments the content of the register by one. There may be occasions

when the increment microoperation must be done with a combinational

circuit independent of a particular register. This can be accomplished by

means of half-adders connected in cascade.

The diagram of a 4-bit combinational circuit incrementer is shown in Fig.2.8.

One of the inputs to the least significant half-adder (HA) is connected to

logic-1 and the other input is connected to the least significant bit of the

number to be incremented. The output carry from one half-adder is

connected to one of the inputs of the next-higher-order half-adder. The

circuit receives the four bits from A0 through A3, adds one to it, and

generates the incremented output in S0 through S3. The output carry C4 will



be 1 only after incrementing binary 1111. This also causes outputs S0

through S3 to go to 0.

Fig. 2.8: 4-bit binary incrementer

The circuit of Fig.2.8 can be extended to an n-bit binary incrementer by

extending the diagram to include n half-adders. The least significant bit must

have one input connected to logic-1. The other inputs receive the number to

be incremented or the carry from the previous stage.

Arithmetic Circuit

The arithmetic microoperations listed in Table 2.3 can be implemented in

one composite arithmetic circuit. The basic component of an arithmetic

circuit is the parallel adder. By controlling the data inputs to the adder, it is

possible to obtain different types of arithmetic operations.

The diagram of a 4-bit arithmetic circuit is shown in Fig.2.9. It has four full-

adder circuits that constitute the 4-bit adder and four multiplexers for

choosing different operations. There are two 4-bit inputs A and B and a

4-bit output D. The four inputs from A go directly to the X inputs of the binary

adder. Each of the four inputs from B is connected to the data inputs of the

multiplexers. The multiplexers data inputs also receive the complement of B.

The other two data inputs are connected to logic-0 and logic-1. Logic-0 is a

fixed voltage value (0 volts for TTL integrated circuits) and the logic-1 signal

can be generated through an inverter whose input is 0. The four multiplexers



are controlled by two selection inputs, S1 and S0. The input carry Cin goes to

the carry input of the FA in the least significant position. The other carries

are connected from one stage to the next.

Fig. 2.9: 4-bit arithmetic circuit



The output of the binary adder is calculated from the following arithmetic

sum:

D = A + Y + Cin

where A is the 4-bit binary number at the X inputs and Y is the 4-bit binary

number at the Y inputs of the binary adder. Cin is the input carry, which can

be equal to 0 or 1. Note that the symbol + in the equation above denotes an

arithmetic plus. By controlling the value of Y with the two selection inputs S1

and S0 and making Cin equal to 0 or 1, it is possible to generate the eight

arithmetic microoperations listed in Table 2.4.

Select Input Output Microoperation

S1 S0 Cin Y D=A+Y+Cin

0 0 0 B D=A+B Add

0 0 1 B D=A+B+1 Add with carry

0 1 0 B D=A+B Subtract with borrow

0 1 1 B D=A+B +1 Subtract

1 0 0 0 D=A Transfer A

1 0 1 0 D=A+1 Increment A

1 1 0 1 D=A-1 Decrement A

1 1 1 1 D=A Transfer A

Table 2.4: Arithmetic Circuit Function Table

When S1S0 = 00, the value of B is applied to the Y inputs of the adder. If

Cin = 0, the output D = A + B. If Cin = 1, output D = A + B + 1. Both cases

perform the add microoperation with or without adding the input carry.

When S1S0 = 01, the complement of B is applied to the Y inputs of the

adder. If Cin = 1, then D = A + B + 1. This produces A plus the 2’s

complement of B, which is equivalent to a subtraction of A – B. When

Cin = 0, then D = A + B . This is equivalent to a subtract with borrow, that is,

A – B – 1.



When S1S0 = 10, the inputs from B are neglected, and instead, all 0’s are

inserted into the Y inputs. The output becomes D = A + 0 + Cin. This gives

D = A when Cin = 0 and D = A + 1 when Cin = 1. In the first case we have a

direct transfer from input A to output D. In the second case, the value of A is

incremented by 1.

When S1S0 = 11, all 1’s are inserted into the Y inputs of the adder to

produce the decrement operation D = A – 1 when Cin = 0. This is because a

number with all 1’s is equal to the 2’s complement of 1 (the 2’s complement

of binary 0001 is 1111). Adding a number A to the 2’s complement of 1

produces F = A + 2’s complement of 1 = A – 1. When Cin = 1, then

D = A – 1 + 1 = A, which causes a direct transfer from input A to output D.

Note that the microoperation D = A is generated twice, so there are only

seven distinct microoperations in the arithmetic circuit.

2.6 Logic Microoperations

Logic microoperations specify binary operations for strings of bits stored in

registers. These operations consider each bit of the register separately and

treat them as binary variables. For example, the exclusive-OR

microoperation with the contents of two registers R1 and R2 is symbolized

by the statement

P: R1 R1 R2

It specifies a logic microoperation to be executed on the individual bits of the

registers provided that the control variable P = 1. As a numerical example,

assume that each register has four bits. Let the content of R1 be 1010 and

the content of R2 be 1100. The exclusive-OR microoperation stated above

symbolizes the following logic computation:

1010 Content of R1

1100 Content of R2

0110 Content of R1 after P = 1



The content of R1, after the execution of the microoperation, is equal to the

bit-by-bit exclusive-OR operation on pairs of bits in R2 and previous values

of R1. The logic microoperations are seldom used in scientific computations,

but they are very useful for bit manipulation of binary data and for making

logical decisions.

Special symbols will be adopted for the logic microoperations OR, AND, and

complement, to distinguish them from the corresponding symbols used to

express Boolean functions. The symbol V will be used to denote an OR

microoperation and the symbol ʌ to denote an AND microoperation. The

complement microoperation is the same as the 1’s complement and uses a

bar on top of the symbol that denotes the register name. By using different

symbols, it will be possible to differentiate between a logic microoperation

and a control (or Boolean) function. Another reason for adopting two sets of

symbols is to be able to distinguish the symbol +, when used to symbolize

an arithmetic plus, from a logic OR operation. Although the + symbol has

two meanings, it will be possible to distinguish between them by noting

where the symbol occurs. When the symbol + occurs in a microoperation, it

will denote an arithmetic plus. When it occurs in a control (or Boolean)

function, it will denote an OR operation. We will never use it to symbolize

an OR microoperation. For example, in the statement

P + Q: R1 R2 + R3, R4 R5 V R6

the + between P and Q is an OR operation between two binary variables of

a control function. The + between R2 and R3 specifies an add

microoperation. The OR microoperation is designated by the symbol V

between registers R5 and R6.

List of Logic Microoperations

There are 16 different logic operations that can be performed with two

binary variables. They can be determined from all possible truth tables



obtained with two binary variables as shown in Table 2.5. In this table, each

of the 16 columns F0 through F15 represents a truth table of one possible

Boolean function for the two variables x and y. Note that the functions are

determined from the 16 binary combinations that can be assigned to F.

x y F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Table 2.5: Truth Tables for 16 Functions of Two Variables

The 16 Boolean functions of two variables x and y are expressed in

algebraic form in the first column of Table 2.6. The 16 logic microoperations

are derived from these functions by replacing variable x by the binary

content of register A and variable y by the binary content of register B. It is

important to realize that the Boolean functions listed in the first column of

Table 2.6 represent a relationship between two binary variables x and y.

The logic microoperations listed in the second column represent a

relationship between the binary content of two registers A and B. Each bit

of the register is treated as a binary variable and the microoperation is

performed on the string of bits stored in the registers.

Boolean function Microoperation Name

F0 = 0 F 0 Clear

F1 = xy F A ʌ B AND

F2 = xy’ F A ʌ B

F3 = x F A Transfer A

F4 = x’y F A ʌ B

F5 = y F B Transfer B

F6 = x y F A B Exclusive-OR

F7 = x + y F A V B OR



F8 = (x + y)’ F AVB NOR

F9 = (x y)’ F BA Exclusive-NOR

F10 = y’ F B Complement B

F11 = x + y’ F A V B

F12 = x’ F A Complement A

F13 = x’ + y F A V B

F14 = (xy)’ F BA NAND

F15 = 1 F all 1s Set to all 1s

Table 2.6: Sixteen Logic Microoperations

Hardware Implementation

The hardware implementation of logic microoperations requires that logic

gates be inserted for each bit or pair of bits in the registers to perform the

required logic function. Although there are 16 logic microoperations, most

computers use only four – AND, OR, XOR (exclusive-OR), and

complement–from which all others can be derived.

Fig. 2.10 shows one stage of a circuit that generates the four basic logic

microoperations. It consists of four gates and a multiplexer. Each of the four

logic operations is generated through a gate that performs the required

logic. The outputs of the gates are applied to the data inputs of the

multiplexer. The two selection inputs S1 and S0 choose one of the data

inputs of the multiplexer and direct its value to the output. The diagram

shows one typical stage with subscript i. For a logic circuit with n bits, the

diagram must be repeated n times for I = 0, 1, 2, …, n-1. The selection

variables are applied to all stages. The function table in Fig.2.10 (b) lists the

logic microoperations obtained for each combination of the selection

variables.



Fig. 2.10: One stage of logic circuit

Some Applications

Logic microoperations are very useful for manipulating individual bits or a

portion of a word stored in a register. They can be used to change bit

values, delete a group of bits, or insert new bit values into a register. The

following examples show how the bits of one register (designated by A) are

manipulated by logic microoperations as a function of the bits of another

register (designated by B). In a typical application, register A is a processor

register and the bits of register B constitute a logic operand extracted from

memory and placed in register B.

The selective-set operation sets to 1 the bits in register A where there are

corresponding 1’s in register B. It does not affect bit positions that have 0’s

in B. The following numerical example clarifies this operation:

1010 A before

1100 B (logic operand)

1110 A after



The two leftmost bits of B are 1’s, so the corresponding bits of A are set to

1. One of these two bits was already set and the other has been changed

from 0 to 1. The two bits of A with corresponding 0’s in B remain

unchanged. The example above serves as a truth table since it has all four

possible combinations of two binary variables. From the truth table we note

that the bits of A after the operation are obtained from the logic-OR

operation of bits in B and previous values of A. Therefore, the OR

microoperation can be used to selectively set bits of a register.

The selective-complement operation complements bits in A where there are

corresponding 1’s in B. It does not affect bit positions that have 0’s in B.

For example:

1010 A before


0110 A after

Again the two leftmost bits of B are 1’s, so the corresponding bits of A are

complemented. This example again can serve as a truth table from which

one can deduce that the selective-complement operation is just an

exclusive-OR microoperation. Therefore, the exclusive-OR microoperation

can be used to selectively complement bits of a register.

The selective-clear operation clears to 0 the bits in A only where there are

corresponding 1’s in B. For example:

1010 A before


0010 A after

Again the two leftmost bits of B are 1’s, so the corresponding bits of A are

cleared to 0. One can deduce that the Boolean operation performed on the

individual bits is AB’. The corresponding logic microoperation is



A A ˄ B

The mask operation is similar to the selective-clear operation except that the

bits of A are cleared only where there are corresponding 0’s in B. The mask

operation is an AND micro operation as seen from the following numerical

example:

1010 A before


1000 A after masking

The two rightmost bits of A are cleared because the corresponding bits of B

are 0s. The two leftmost bits are left unchanged because the corresponding

bits of B are 1s. The mask operation is more convenient to use than the

selective-clear operation because most computers provide an AND

instruction, and few provide an instruction that executes the microoperation

for selective-clear.

The insert operation inserts a new value into a group of bits. This is done by

first masking the bits and then ORing them with the required value. For

example, suppose that an A register contains eight bits, 0110, 1010. To

replace the four leftmost bits by the value 1001 we first mask the four

unwanted bits:

0110 1010 A before

0000 1111 B (mask)

0000 1010 A after masking

and then insert the new value:

0000 1010 A before

1001 0000 B (insert)

1001 1010 A after insertion



The mask operation is an AND microoperation and the insert operation is an

OR microoperation.

The clear operation compares the words in A and B and produces an all 0s

result if the two numbers are equal. This operation is achieved by an

exclusive-OR microoperation as shown by the following example:

1010 A

1010 B

0000 A A B

When A and B are equal, the two corresponding bits are either both 0 or

both 1. In either case the exclusive-OR operation produces a 0. The all-0s

result is then checked to determine if the two numbers were equal.

2.7 Shift Microoperations

Shift microoperations are used for serial transfer of data. They are also used

in conjunction with arithmetic, logic, and other data-processing operations.

The contents of a register can be shifted to the left or the right. At the same

time that the bits are shifted, the first flip-flop receives its binary information

from the serial input. During a shift-left operation the serial input transfers a

bit into the rightmost position. During a shift-right operation the serial input

transfers a bit into the leftmost position. The information transferred through

the serial input determines the type of shift. There are three types of shifts:

logical, circular, and arithmetic.

A logical shift is one that transfers 0 through the serial input. We will adopt

the symbols shl and shr for logical shift-left and shift-right microoperations.

For example:

R1 shl R1

R2 shr R2



are two microoperations that specify a 1-bit shift to the left of the content of

register R1 and a 1-bit shift to the right of the content of register R2. The

register symbol must be the same on both sides of the arrow. The bit

transferred to the end position through the serial input is assumed to be 0

during a logical shift.

The circular shift (also known as a rotate operation) circulates the bits of the

register around the two ends without loss of information. This is

accomplished by connecting the serial output of the shift register to its serial

input. We will use the symbols cil and cir for the circular shift left and right,

respectively. The symbolic notation for the shift microoperations is shown in

Table 2.7.

Symbolic designation Description

R shl R Shift-left register R

R shr R Shift-right register R

R cil R Circular shift-left register R

R cir R Circular shift-right register R

R ashl R Arithmetic shift-left R

R ashr R Arithmetic shift-right R

Table 2.7: Shift Microoperations

An arithmetic shift is a microoperation that shifts a signed binary number to

the left or right. An arithmetic shift-left multiplies a signed binary number by

2. An arithmetic shift-right divides the number by 2. Arithmetic shifts must

leave the sign bit unchanged because the sign of the number remains the

same when it is multiplied or divided by 2. The leftmost bit in a register holds

the sign bit, and the remaining bits hold the number. The sign bit is 0 for

positive and 1 for negative. Negative numbers are in 2’s complement form.

Fig.2.11 shows a typical register of n bits. Bit Rn-1 in the leftmost position



holds the sign bit. Rn-2 is the most significant bit of the number and R0 is the

least significant bit. The arithmetic shift-right leaves the sign bit unchanged

and shifts the number (including the sign bit) to the right. Thus Rn-1 remains

the same, Rn-2 receives the bit from Rn-1, and so on for the other bits in the

register. The bit in R0 is lost.

Rn-1

Fig. 2.11: Arithmetic shift right

The arithmetic shift-left inserts a 0 into R0, and shifts all other bits to the left.

The initial bit of Rn-1 is lost and replaced by the bit from Rn-2. A sign reversal

occurs if the bit in Rn-1 changes in value after the shift. This happens if the

multiplication by 2 causes an overflow. An overflow occurs after an

arithmetic shift left if initially, before the shift, Rn-1 is not equal to Rn-2. An

overflow flip-flop Vs can be used to detect an arithmetic shift-left overflow.

Vs = Rn-1 Rn-2

If Vs = 0, there is no overflow, but if Vs = 1, there is an overflow and a sign

reversal after the shift. Vs must be transferred into the overflow flip-flop with

the same clock pulse that shifts the register.

Hardware Implementation

A possible choice for a shift unit would be a bidirectional shift register with

parallel load. Information can be transferred to the register in parallel and

then shifted to the right or left. In this type of configuration, a clock pulse is

needed for loading the data into the register, and another pulse is needed to

initiate the shift. In a processor unit with many registers it is more efficient to

implement the shift operation with a combinational circuit. In this way the

Rn-1 Rn-2 R0 R1

Sign bit



content of a register that has to be shifted is first placed onto a common bus

whose output is connected to the combinational shifter, and the shifted

number is then loaded back into the register. This requires only one clock

pulse for loading the shifted value into the register.

A combinational circuit shifter can be constructed with multiplexers as

shown in Fig.2.12. The 4-bit shifter has four data inputs. A0 through A3, and

four data outputs, H0 through H3. There are two serial inputs, one for shift

left (IL) and the other for shift right (IR). When the selection input S=0, the

input data are shifted right (down in the diagram). When S=1, the input data

are shifted left (up in the diagram). The function table in Fig.2.12 shows

which input goes to each output after the shift. A shifter with n data inputs

and outputs requires n multiplexers. The two serial inputs can be controlled

by another multiplexer to provide the three possible types of shifts.



Fig. 2.12: 4-bit combinational circuit shifter

2.8 Arithmetic Logic Shift Unit

Instead of having individual registers performing the microoperations

directly, computer systems employ a number of storage registers connected

to a common operational unit called an arithmetic logic unit, abbreviated

ALU. To perform a microoperation, the contents of specified registers are

placed in the inputs of the common ALU. The ALU performs an operation

and the result of the operation is then transferred to a destination register.



The ALU is a combinational circuit so that the entire register transfer

operation from the source registers through the ALU and into the destination

register can be performed during one clock pulse period. The shift

microoperations are often performed in a separate unit, but sometimes the

shift unit is made part of the overall ALU.

The arithmetic, logic, and shift circuits introduced in previous sections can

be combined into one ALU with common selection variables. One stage of

an arithmetic logic shift unit is shown in Fig.2.13. The subscript I designates

a typical stage. Inputs Ai and Bi are applied to both the arithmetic and logic

units. A particular microoperation is selected with inputs S1 and S0. A 4 x 1

multiplexer at the output chooses between an arithmetic output in Ei and a

logic output in Hi. The data in the multiplexer are selected with inputs S3

and S2. The other two data inputs to the multiplexer receive inputs Ai-1 for

the shift-right operation and Ai+1 for the shift-left operation. Note that the

diagram shows just one typical stage. The circuit of Fig.2.13 must be

repeated n times for an n-bit ALU. The output carry Ci+1 of a given

arithmetic stage must be connected to the input carry Ci of the next stage in

sequence. The input carry to the first stage is the input carry Cin, which

provides a selection variable for the arithmetic operations.



Fig. 2.13: One stage of arithmetic logic shift unit

The circuit whose one stage is specified in Fig.2.13 provides eight arithmetic

operation, four logic operations, and two shift operations. Each operation is

selected with the five variables S3, S2, S1, S0, and Cin. The input carry Cin is

used for selecting an arithmetic operation only.

Table 2.8 lists the 14 operations of the ALU. The first eight are arithmetic

operations and are selected with S3S2=00. The next four are logic

operations and are selected with S3S2=01. The input carry has no effect



during the logic operations and is marked with don’t-care x’s. The last two

operations are shift operations and are selected with S3S2=10 and 11. The

other three selection inputs have no effect on the shift.

Operation Select Operation Function

S3 S2 S1 S0 Cin

0 0 0 0 0 F = A Transfer A

0 0 0 0 1 F = A + 1 Increment A

0 0 0 1 0 F = A + B Addition

0 0 0 1 1 F = A + B + 1 Add with carry

0 0 1 0 0 F = A + B Subtract with borrow

0 0 1 0 1 F = A + B + 1 Subtraction

0 0 1 1 0 F = A – 1 Decrement A

0 0 1 1 1 F = A Transfer A

0 1 0 0 X F = A ʌ B AND

0 1 0 1 X F = A V B OR

0 1 1 0 X F = A B XOR

0 1 1 1 X F = A Complement A

1 0 X X X F = shr A Shift right A into F

1 1 X X X F = shl A Shift left A into F

Table 2.8: Function Table for Arithmetic Logic Shift Unit

2.9 Summary

This unit dealt with Register transfer language, Bus selection, Arithmetic

circuits, Logical circuits and Arithmetic logic shift unit in detail.


1. The operations executed on data stored in registers are called

__________.

2. The symbolic notation used to describe the microoperation transfers

among registers is called a __________________.

3. The register that holds an address for the memory unit is usually called

a _________________.



4. It is convenient to separate the control variables from the register

transfer operation by specifying a ______________.

5. A more efficient scheme for transferring information between registers in

a multiple-register configuration is a ______________.


1. What is Register Transfer Language? Explain.

2. Give and explain the block diagram of the bus system with four

registers.

3. Explain the various arithmetic microoperations performed in registers.

4. Give and explain one stage of logic circuit.

5. Give and explain one stage of arithmetic logic shift unit.

2.11 Answers


1. microoperations

2. register transfer language (RTL)

3. memory address register (MAR)

4. control function

5. common bus system

Terminal Questions:








Unit 3 Basic Structure of a Digital Computer

Structure:

3.1 Introduction

Objectives

3.2 Mechanical and Electromechanical Ancestors

3.3 Structure of a Computer System

3.4 Arithmetic Logic Unit

3.5 Control Unit

3.6 Bus Structure

3.7 Von Neumann Architecture

3.8 Summary


3.10 Answers

3.1 Introduction

A computer is a digital device that is capable of computing and processing

data. Computers run instructions that are given to it. Instructions are low-

level commands like add, subtract, or compare two numbers, and run some

instruction if the comparison is true, and run some other instruction if it's

false.

The list of instructions is called program and internal storage is called

memory. Information fed to a computer can be categorized as either

instruction or data. Instructions are the explicit commands that govern

the transfer of information within the computer as well as between the

computer and its I/O devices and specify the operations to be

performed.

A computer is defined as a machine for manipulating data according to a list

of instructions. Earlier the term "computer" referred to a person who



performed numerical calculations often with the aid of a mechanical

calculating device.

Objectives:

By the end of Unit 3, you should be able to:

1. Define mini computers, micro computers, main frames and super

computers.

2. Explain the basic structure of the computer.

3. Discuss the Central Processing Unit of the computer

4. Explain the bus or interconnection system.

3.2 Mechanical and Electromechanical ancestors

The first machine to attract widespread attention was built in 1642, by a

French philosopher and scientist B. Pascal. This machine was a

mechanical counter for carrying operations like addition and subtraction. It

consisted of two sets of gears with teeth (counter wheels) for representing

decimal numbers. Each gear had 10 decimal digits engraved on it. The

position of the dial indicated the decimal value. Each set of dials was used

to temporarily hold a number like a register. One special register was

considered like an accumulator which held the running total. There was one

more register which was used to enter a number to be added or subtracted

from the accumulator. Thus when the machine was set in motion, the

numbers in the two sets of dials were added, with the result appearing in the

accumulator. Two main technical innovations were:

1. A ratchet device to transfer a carry automatically from one place to next.

2. A means of storing a negative number referred as complement

representation.

By means of complement number representation the machine could

accomplish both addition and subtraction.



In 1671, the German mathematician and philosopher Gottfried Leibniz

constructed a calculator that could perform multiplication and division along

with addition and subtraction. Leibniz‟s machine could perform additional

operations in a repetitive step by step fashion using chains and pulleys. This

machine was the forerunner of many machines that was also called as four

function calculator.

In 1750, the punch cards were developed to specify the pattern in the

technology of weaving. Symbols stored on punch cards were used to

represent instructions. There were series of modifications and finally in 1801,

Joseph Marie Jacquard made an improvement to the textile loom. The

machine used a series of punched paper cards. He produced a very

successful loom in which all the power was supplied mechanically and all

the control through the punch card. And the cards were moved through loom

apparatus. The presence or absence of a hole dictated the movements of

parts of the loom to create the desired pattern. Thus this loom was a

programmable process control machine with the „program‟ supplied on

punch cards were used as a template to allow his loom to weave intricate

patterns automatically.. The resulting Jacquard loom was an important step

in the development of computers because the use of punched cards to

define woven patterns can be viewed as an early, albeit limited, form of

programmability.

The ideas behind the general purpose programmable computer were

developed by Charles Babbage in the 19th century. He designed two

machines: Difference Engine and Analytical Engine.

Difference Engine: The difference engine was designed to calculate the

entries of a table automatically and transfer them via steel punches to an

engineer‟s plate from which the tables could be printed. The only arithmetic

operation performed by this machine was addition. However using a



mathematical technique called method of finite difference, a large number

of useful formulas could be computed either exactly or approximately using

only additions. The formulas included polynomials and trigonometric

functions like sine functions. The structure of Babbage‟s Difference Engine

is as shown in figure 3.1 below.

Initial values

Register

Adder

Register

Adder

Register

Adder

Register

Figure 3.1: Structure of Babbage’s Difference Engine

The difference engine consisted of a number of mechanical registers. Each

register consisted of a set of counter wheels and could store a decimal

number. Pairs of adjacent registers were connected by an adding

mechanism. To compute results, initial values were loaded into the registers.

When the difference engine was driven by a suitable motor it could then

perform a series of steps to give an answer. This machine could

accommodate third degree polynomials and 15-digit numbers. Babbage

proposed to build a machine that would accommodate sixth degree

polynomials and 20-digit numbers but could not succeed as the British

government withdrew its support. In the mean time he came up with a more

powerful and ambitious machine which he called an Analytical engine.



Analytical engine: This machine was designed to be a general purpose

device that is capable of performing any mathematical operation

automatically. The Analytical engine‟s structure is as shown in Figure 3.2

below.

The store

(Memory)

The Mill

{Arithmetic

Function)

Printer

&

Punch card

Variable

cards

Operation

cards

program

Instructions

Figure 3.2: Structure of Babbage’s analytical engine

The Analytical engine consists of five units of which two main units are the

Store and the Mill. To control the sequence of operations, punch cards

were used which were of the type developed earlier for Jacquard‟s loom.

Each of these components are described below:

The Store: Is a memory unit comprising sets of counter wheels

The Mill: It corresponds to a modern Arithmetic Logic Unit. It was

capable of performing four basic arithmetic operations. It operated on

pairs of mechanical registers and produced a result stored in another

mechanical register, all of which were stored in the store (a memory as

defined above)

The punch cards: It constituted a computer program and were divided

into two groups:



o 90 Operation cards: These cards were used to control the

operation of the Mill. Each operation card selected one of the four

possible operations (that is addition, subtraction, multiplication and

division) at each step in the program.

o Variable cards: It is used to select the memory locations to be used

by a particular operation, that is the source of the input operands and

the destination of the results.

Output: Is a printer or a card punch device so that output data either

printed on a printer or punched on cards.

Large-scale automated data processing of punched cards was performed

for the US Census in 1890 by tabulating machines designed by Herman

Hollerith and manufactured by the Computing Tabulating Recording

Corporation, which later became IBM.

During the first half of the 20th century, many scientific computing needs

were met by increasingly sophisticated analog computers. However, these

were not programmable and generally lacked the versatility and the

accuracy of modern digital computers.

Computers take numerous physical forms. Early electronic computers were

the size of a large room, consuming as much power as several hundred

modern personal computers. Today, computers can be made small enough

to fit into a wrist watch and be powered from a watch battery. Society has

come to recognize personal computers and, the laptop computer as icons of

the information age. However, the most common form of computer in use

today is by far the embedded computer. Embedded computers are small,

simple devices that are often used to control other devices. For example,

they may be found in machines ranging from fighter aircraft to industrial

robots, digital cameras, and even children's toys.



The ability to store and execute programs makes computers extremely

versatile. Computers can range from personal computers to supercomputers

which are all able to perform same computational tasks so long as time and

storage capacity are not considerations. The use of digital electronics and

more flexible programmability were important steps, in the development of

digital electronic computer.

EDSAC (Electronic Delay Storage Automatic Calculator) was one of the

first computers to implement the stored program (von Neumann)

architecture. Several developers came up with a far more flexible and

elegant design, which came to be known as the stored program

architecture or von Neumann architecture. This design was first formally

described by John von Neumann in the paper "First Draft of a Report on the

EDVAC", published in 1945. Nearly all modern computers implement some

form of the stored program architecture.

The modern definition of a computer is an electronic device that

performs calculations on data, presenting the results to humans or

other computers in a variety of useful ways. A computer is a complex

system that contains millions of elementary electronic components. Hence

computer can be considered to have hierarchic nature. A hierarchic system

is a set of interrelated subsystems each again hierarchical in nature until we

reach some lowest level of elementary subsystem. A hierarchic nature of a

complex system is essential to both their design and their description. At

each level the designer is concerned with the structure and function.

Definitions of few terms used in this course

Structure: It defines the way in which the components of a computer

are interrelated.

Function: It defines the operation of each individual component as a

part of the structure.



Architecture: It refers to those attributes of a computer system which

are visible to a programmer.

Organization: It refers to the operational units of a computer and their

interconnections, and how they implement the architecture of the system.

Classification of Computers

Computers can be classified into many types based on their size, speed

and cost.

The smallest machines are called as microcomputers. For example a

personal computer, that is widely used in schools, homes and business

offices.

The next higher level machine is called a minicomputer. Minicomputers

are widely used in payroll, scientific computing applications etc.

Mainframes are used for business data processing, when computing

and storage capacity is larger than what the minicomputers can handle.

Supercomputers are used for large scale numerical calculations such

as weather forecasting, missile launching, aircraft and simulation.

We usually describe the computer system using top down approach which is

clearest and most effective. In this unit we describe the structure and

function of the major components of a computer system, and then proceed

successively to the lower layers of the hierarchy in the next coming units of

this book.

3.3 Structure of a Computer System

A computer is an entity that interacts in some or the other way with its

external environment. All of its linkages to external environment are

classified as peripheral devices or communication lines. In particular,

the basic model of a computer is as shown in figure 3.3. It consists of four

main components, Central Processing Unit (CPU), Memory, Input /



Output and a Bus. Bus can also be a wire or a communication line or in

general it can be referred to as a system interconnection.

Bus

Interface

Input

Output

I/O Interface

Memory

Processor

ALU

Control

CPU

Keyboard

CRT

DISK

Network

Figure 3.3: Functional units of a computer

CPU: This is the computational unit and is the computer's heart. This

entity controls the operations of the computer and performs its data

processing functions. It is usually referred to as a processor. All the

actions of all components are controlled by the control unit of CPU.

CPU comprises two units called Arithmetic Logic Unit (ALU), where all

the arithmetic and logical operations are performed and the Control

Unit (CU) which coordinates with all other units for proper system

operation.

Memory: Memory is used to store the instructions, data and the result

as well. Memory unit is an integral part of a computer system. The main

function of a memory unit is to store the information needed by the

system.



Input/output interface: They are used to move data from the computer

and from its external environment. The external environment may be an

input or an output device like Printer, displays, keyboard etc.

System interconnection: This constitutes some mechanism that

provide for communication among CPU, Memory & I/O. These can be

referred to as a system BUS.

Traditionally the computer system consists of a single CPU. But some

machines like multiprocessing involves the use of multiple CPU‟s and share

a single memory.

Central Processing Unit

It is the heart or core component of CPU. Figure 3.4 shows the basic

functional components of Central Processing Unit.

Figure 3.4: Basic Block diagram of CPU

Its major structural components are Control unit, ALU, Registers and CPU

interconnections.

Control Unit: It controls the operations of the CPU and hence the

computer system.

Arithmetic logic unit (ALU): It performs the computers data processing

functions

Registers

Contol Unit

ALU

Internal CPU

interconnection



Registers: It forms the internal memory for CPU. It provides storage,

internal to the CPU.

CPU interconnections: It provides means for communication among

the control unit, ALU and registers of the CPU.

The CPU must carry out the tasks given below:

1. Read the given instructions

2. Decode them

3. Get operands for execution

4. Process the instruction

5. Give out / store the result

To carry out these tasks the CPU needs to temporarily store some data. It

must remember the location of the last instruction so that it can know where

to get the next instruction. It needs to store instruction and data temporarily

while an instruction is being executed. In general the CPU needs an internal

memory for all store, either instruction or data.

The CPU contains a handful of registers which act like local variables. The

CPU runs instructions and performs computations, mostly by the ALU. The

registers are the only memory the CPU has. Register memory is very fast

for the CPU to access, since it resides in the CPU itself.

However, the CPU has rather limited memory. All the local memory it uses

is in registers. It has very fast access to registers, which are on-board on the

CPU. It has much slower access to RAM.

Arithmetic logic unit (ALU)

The ALU is a collection of logic circuits designed to perform arithmetic

(addition, subtraction, multiplication, and division) and logical operations

(not, and, or, and exclusive-or). It's basically the calculator of the CPU.

When an arithmetic or logical operation is required, the values and

command are sent to the ALU for processing.



Control Unit:

The purpose of control unit is to control the system operations by routing

the selected data items to the selected processing hardware at the right

time. Control unit acts as nerve centre for the other units.

Instruction decoder:

All instructions are stored as binary values. The instruction decoder

receives the instruction from memory, interprets the value to see what

instruction is to be performed, and tells the ALU and the registers which

circuits to energize in order to perform the function.

Registers:

The registers are used to store the data, addresses, and flags that are in

use, by the CPU.

Memory Units

Memory is basically a large array of bytes. The main function of a memory

unit is to store the information needed by the system. Information stored can

be data, an instruction that is nothing but programs and may be some

garbage. Memory locations that do not contain any valid data may store

some arbitrary values and hence they are termed as garbage data. Memory

unit is an integral part of a computer system.

The system performance is largely dependent on the organization, storage

capacity and speed of operation of the memory system. The CPU can read

or write to the memory, but it's much slower than accessing registers.

Nevertheless, you need memory because registers simply hold too little

information.

Most of the memory is in RAM, which can be thought of as a large array of

bytes as shown in figure 3.5. In an array, we can refer to individual elements

using an index. In computer organization, indexes are more commonly

referred to as addresses. Addresses are the numbers used to identify



successive locations. Specifying its address and a command that performs

the storage or retrieval process can access a word.

Figure 3.5: System showing the registers and memory as array of bytes

The number of bits in each word is called as word length of the computer.

Large computers usually have 32 or more bits in a word. Word length of

microcomputers ranges from 8 to 32 bits. The capacity of the memory is one

factor that decides the size of the computer. Data are usually manipulated

within the machine in units of words, multiples of words or parts of words.

During execution the program must reside in the main memory. Instructions

and data are written into the memory or read out from the memory under the

control of a processor. Most memory is byte-addressable, meaning that

each address refers to one byte of memory.

The bulk of the memory is stored in a separate device called RAM usually

called physical memory. RAM stores programs as well as data. The CPU

fetches an instruction in RAM to a register, which is referred as an

instruction register. This register then determines what instruction it has, and

executes the instruction. Executing the instruction may require loading data

from RAM to the CPU or storing data from the CPU to RAM.



The time required to access one word is called the memory access time.

Classification of Memory system of a Computer

Memory system of a computer can be broadly classified into four groups.

Internal Memory

Internal memory refers to a set of CPU registers. These serve as

working memory, storing temporary results during the computation

process. They form a general purpose register file for storing the data as

it is processed. Since the cost of these registers is very high only few

registers can be used in the CPU.

Primary Memory

Primary memory is also called as main memory, which operates at

electronic speeds. CPU can directly access the program stored in the

primary memory. Main memory consists of a large number of

semiconductor storage cells. Each cell is capable of storing one bit of

information. Word is a group of these cells. Main memory is organized

so that the contents of one word, containing n bits, can be stored or

retrieved in one basic operation.

Secondary Memory

This memory type is much larger in capacity and also much slower than

the main memory. Secondary memory stores system programs, large

data files and the information which is not regularly used by the CPU.

When the capacity of the main memory is exceeded, the additional

information is stored in the secondary memory. Information from the

secondary memory is accessed indirectly through the I/O programs that

transfer the information between the main memory and secondary

memory. Examples for secondary memory devices are magnetic hard

disks and CD-ROMs.



Cache Memory

The performance of a computer system will be severely affected if the

speed disparity between the processor and the main memory is

significant. The system performance can be improved by placing a small,

fast acting buffer memory between the processor and the main memory.

This buffer memory is called as cache memory. Cost of this memory is

very high.

Input / Output and I/O Interface

Any movement of information from or to the computer system is considered

as Input/Output. The CPU and its supporting circuitry provide I/O methods.

Input and output unit is usually combined under the term input-output unit

(I/O). For example consider the keyboard of a video terminal, which consists

of keyboard for input and a cathode ray tube display for output.

There are a wide variety of peripherals which deliver different amounts of

data, run at different speeds and present data in different formats. All the I/O

peripherals are slower than CPU and RAM. Hence I/O units need proper

I/O interfaces.

Input Devices:

Computer accepts the coded information through the input unit. It has the

capability of reading the instructions and data to be processed. The most

commonly used input device is the keyboard of a video terminal. This is

electronically connected to the processing part of a computer. The keyboard

is wired such that whenever a key is pressed the corresponding letter or

digit is automatically translated into its corresponding code and is directly

sent either to memory or the processor.

Output Devices

Output unit displays the processed results. Examples are video terminals

and graphic displays.

http://en.wikipedia.org/wiki/I/O_methods



I/O devices do not alter the information content or the meaning of the data.

Some devices can be used as output only e.g. graphic displays.

Following are the Input / Output techniques

o Programmed

o Interrupt driven

o Direct Memory Access (DMA)

System Interconnection / BUS

“A bus is a communication pathway connecting two or more devices.”

A key characteristic of a bus is that it is a shared transmission medium.

Multiple devices connect to the bus, and a signal transmitted by any one

device is available for reception by all other devices attached to the bus. If

two devices transmit during the same time period, their signals will overlap

and become garbled. Thus, only one device at a time can successfully

transmit. The communication between the external environment and CPU is

established through the System Bus. System bus is classified into three

different types, depending on whether it carries Data, Control, or Address

information and are indicated in figure 3.5.



3.4 ALU

ALU

Shifter

Status Flag

Complementer

Arithmetic &

Boolean logic In

tern

al C

PU

BU

S

Register/

instruction

decoder

Control

Unit

Figure 3.6: CPU showing internal components of ALU

Basic components of ALU are as shown in figure 3.6. The Arithmetic and

Logic Unit is the core of any processor. It performs the calculations on the

input data given. ALU is needed to transfer data between the various

registers. Always ALU operates only on data in the internal CPU memory.

ALU is capable of performing any arithmetic and boolean operations.

An Arithmetic-Logic Unit or ALU can be considered as a combination of

various circuits in a single circuit that are used to execute data processing

instructions. The complexity of ALU is determined by the way in which its

arithmetic instructions are realized. Simple ALUs that perform fixed point

addition and subtraction, as well as logical operations can be realized by

combinational circuits.

The ALU that realized using a combinational logic that are basically

constructed from AND, OR, and NOT gates. It is basically an

implementation of a Boolean function. A generic diagram for a



combinational logic circuit of ALU is as shown in figure 3.7. In general, a

combinational logic circuit has inputs, which are divided into data inputs and

control inputs, and outputs. Control inputs tell the circuit what to do with the

data inputs.

Figure 3.7: A generic ALU that has 2 inputs and 1 output

A typical ALU will have two input ports and a result port. It will also have a

control input telling it which operation to perform. For example add, subtract,

and, or, etc. It also consists of additional output bits for condition codes.

Basically, these bits indicate some facts about the computation. For

example carry, overflow, negative, zero result. Additional output bits are

together called as the status bits. The status bits are used for branching

operations.

ALUs may be simple and perform only a few operations: Integer arithmetic

like add and subtract and Boolean logic like and, or, complement and left

Shift, right Shift, rotate. Such simple ALUs may be found in small 4- and 8-

bit processors.



Example: Consider a 32 bit ALU as shown in figure 3.8. It consists of

source 1 labeled as SRC1 and source 2 labeled as SRC2 as the two 32-bit

data inputs. It also has a control input, labeled C which is a signal for

addition. These control bits tell the ALU to perform addition operation on the

data inputs.

Figure 3.8: A 32 bit ALU

The result of the computation is sent to the output labeled DST. It is also 32

bits. There are some additional output bits labeled ST. In our example they

may indicate whether the output is zero, or has overflowed.

More complex ALUs will support a wider range of integer operations like

multiplication and division, floating point operations like add, subtract,

multiply, divide. It can even compute mathematical functions like square root,

sine, cosine, log, etc.

To perform arithmetic and logic operations necessary operands are

transferred from the memory location to ALU where one of the operand is

stored temporarily in some register. This register is called temporary

register. Each register stores one word of data.



3.5 Control Unit

The control unit is the portion of the processor that actually causes things to

happen. The purpose of control unit is to control the system operations by

routing the selected data items to the selected processing hardware at the

right time. Control unit acts as nerve centre for the other units. This unit

decodes and translates each instruction and generates the necessary

enable signals for ALU and other units. Control unit has two responsibilities

i.e., instruction interpretation and instruction sequencing.

In instruction interpretation the control unit reads instruction from the

memory and recognizes the instruction type, gets the necessary operand

and sends them to the appropriate functional unit. The signals necessary to

perform desired operation are taken to the processing unit and results

obtained are sent to the specified destination.

In instruction sequencing control unit determines the address of the next

instruction to be executed and loads it into program counter.

In general the I/O transfers are controlled by the software instructions that

identify both the devices involved and the type of transfer. But the actual

timing signals that govern the transfers are generated by the control circuits.

Similarly the data transfer between a processor and the memory is

controlled by the control circuits.

The operation of the computer can be summarized as below:

The computer accepts information through the input unit and transfers it

to the memory.

Information stored in the memory is fetched into arithmetic and logic unit

to perform the desired operations.

Processed information is transferred to the output unit.

All activities inside the machine are controlled by a control unit.



3.6 Bus Structure

A bus consists of 1 or more wires. There's usually a bus that connects the

CPU to memory and to disk and I/O devices. Real computers usually have

several busses, even though the simple computer we have modeled only

has one bus where we consider the data bus, the address bus, and the

control bus as part of one larger bus.

The size of the bus is the number of wires in the bus. We can refer to

individual wires or a group of adjacent wires with subscripts. A bus can be

drawn as a line with a dash across it to indicate there's more than one wire.

The dash in it is then labeled with the number of wires and the designation

of those wires.

Figure 3.9: Representation of a 32 bit Bus

For example, consider a bus as shown in figure 3.9. It consists of a slant

dash on the horizontal line that represents it is a bus that carries more wires.

Also the slant dash is labeled 32 which indicates that the number of wires in

that bus is 32 and the dash is also labeled A31-0 which indicates individual

32 wires from A0 to A31. We can then refer to, say A10-0 or A15-9 to refer to

some subset of the wires.

A bus allows any number of devices to hook up to the bus. Devices

connected to the bus must share the bus. Only one device can write to it at

a time. One alternative to using a bus is to connect each pair of devices

directly. Unfortunately, for N devices, this requires about N2 connections,



which may be too many. Most devices have a fixed number of connections

which doesn't permit dedicated connections to other devices. A bus doesn't

have this problem.

Data, Address, and Control Busses

There are usually 3 kinds of buses. There's a 32-bit data bus, which is used

to write or read 32 bits of data to or from memory. There's a 32-bit address

bus for the CPU to specify which address to read or write from or to memory.

Finally, there's a control bus which may consist of a single wire or multiple

wires to allow the CPU and memory to communicate

For example a control signal is required to indicate when and whether a

read or write is to be performed. To support two 32-bit busses, both the

CPU and memory require 64 pins or connections 32 for data and 32 for

address. Earlier there was shortage of pins and hence it was necessary to

multiplex the address and data bus. Multiplexing uses the same bus as both

address and data bus.

There are other kinds of busses that are used primarily for I/O devices like

USB. These are mostly high-speed busses for external devices.

3.7 Von Neumann Architecture

Figure 3.10: Structure of the IAS Computer



IAS is the first digital computer in which the von Neumann Architecture was

employed. The general structure of the IAS computer is as shown in figure

3.10:

A main memory, which stores both instructions and data

An arithmetic and logic unit (ALU) capable of operating on binary data

A control unit, which interprets the instructions in memory and causes

them to be executed

Input and Output (I/O) equipment operated by the control unit

The von Neumann Architecture is based on three key concepts:

1. Data and instructions are stored in a single read-write memory.

2. The content of this memory is addressable by location, without regard to

the type of data contained therein.

3. Execution occurs in a sequential fashion unless explicitly modified from

one instruction to the next.



PC MAR

IR

. . .

. . .

Instruction

Instruction

Instruction

Instruction

. . .

. . .

Data

Data

Data

. . .

MBR

I/O AR

I/O BR

. . .

PC = Program Counter IR = Instruction Register MAR = Memory address register MBR = Memory buffer register I/O AR = I/O address register I/O BR = I/O buffer register

CPU Memory

I/O Module

ALU

Figure 3.11: Computer components Von Neumann architecture

The CPU consists of various registers as listed in figure 3.11. They are

1. Program Counter (PC): It contains an address of an instruction to be

fetched.

2. Instruction Register (IR): It contains the instruction most recently

fetched.

3. Memory Address Registers (MAR): Contains the address of a location

in memory.

4. Memory Buffer Register (MBR): It contains a word of data to be written

to memory or the word most recently used.

5. I/O Address Register (I/O AR): Contains the address of a I/O.

6. I/O Buffer Register (I/O BR): It contains a word of data to be written to

I/O device.



Any instruction to be executed must be present in the System Memory. The

instruction is read from a location pointed by PC of the memory, and then

transferred it to IR through the data bus. The instruction is decoded and

then the data is brought to the ALU either from memory or register etc. Then

ALU computes the required operation on the data and stores the result in a

special register called Accumulator. All the sequence of actions is controlled

by the control signals generated by the control unit. Thus Accumulator is a

special purpose register designated to hold the result of an operation

performed by the ALU.

3.8 Summary

This unit begins with providing a historical perspective of computers. It, then

provides sufficient discussion on classification of computers, various

components of the computers, followed by memory which is considered to

be the crucial component. Finally this unit concludes with a brief discussion

on ALU and Bus Architecture.


1. ____________ are used for business data processing, when computing

and storage capacity are larger than what the minicomputers can handle.

2. ____________ refers to those attributes of a computer system which are

visible to a programmer.

3. A ____________ is an entity that interacts in some or the other way with

its external environment.

4. Bus can also be a wire or a communication line or in general it can be

referred to as a ____________.

5. ____________ performs the calculations on the input data.




1. Explain the functional units of a basic computer with a neat diagram

2. Explain Von Neumann Architecture.

3. Explain the functions of ALU.

4. Explain the System Bus structure.

5. Give a brief account of early days of computers.

6. Discuss on various types of memories.

3.10 Answers


1. mainframes

2. architecture

3. computer

4. system interconnection

5. arithmetic logic unit (ALU)

Terminal Questions:






6. Refer Section 1.3.2



Unit 4 CPU and Register Organization

Structure:

4.1 Introduction

Objectives

4.2 Registers

User-Visible Registers

Control and Status Registers

Program Status Word (PSW)

4.3 CPU Organization

Fundamental Computer Architecture

CPU organization in 8085 microprocessor

4.4 Register Organization of different machine

The Zilog Z8000 machine

Intel 8086 machine

Motorola 68000 machine

4.5 Instruction cycles

Basic instruction cycle

Basic instruction Cycle state diagram

4.6 Summary


4.8 Answers

4.1 Introduction

As discussed in the previous unit a CPU consists of a set of registers that

function as a level of memory above Main memory and Cache memory. The

registers in the CPU are of two types:

User-visible Registers: These enable the machine or assembly

language programmer to minimize main memory references by use of

registers.



Control and Status Registers: These are used by the control unit to

control the operation of the CPU and by privileged operating system

programs to control the execution of programs.

Note: Sometimes there is no clear separation of registers into these

categories. For example, on some of the machines the register called

Program Counter (PC) is user visible, that is, user can see the contents of

PC, but on some machines it is invisible to the user.

Objectives:

By the end of Unit 4, the learners should be able to:

1. Explain the register organization of any basic computer.

2. Explain status word, program counter registers.

3. Discuss the register organization of Intel 8085 microprocessor and

compare with other machines.

4. Explain the different phases of basic instruction cycle.

4.2 Registers

User-Visible Registers

User visible registers are those registers that are used by the user in the

machine language and can be executed by the CPU. All CPU designs

provide a number of user visible registers. These registers can be

categorized into following different types:

General Purpose Registers: They can be assigned a variety of

functions by the programmer. The general purpose registers can be

considered for orthogonal usage and non-orthogonal usage.

o If any general purpose register can contain the operand for any

opcode then we refer the use of general purpose as orthogonal

usage.



o Sometimes the use of general purpose register is restricted. For

example, there may be dedicated registers that are used for floating

point operations. Then we refer to the use of general purpose as non

orthogonal usage.

Data registers: They can only be used to hold data, and cannot be

employed in the calculation of an operand address.

Address registers: They may be used either in general purpose

addressing modes, or may be devoted to a particular addressing mode.

o Segment Pointers: In a machine CPU with segmented addressing,

a register holds the address of the base of a segment. There may be

multiple segment registers. For example, one for the operating

system, one for the current process.

For example, Intel 8086 family has different segments and they are referred

to as

1) code segment: where the code is written

2) data segment: where the data is placed

3) stack segment: acts like a stack

4) extra segment

Thus there may be many registers one for operating system and one for the

current process.

o Index Registers: These are used for indexed addressing, and may be

auto indexed.

o Stack Pointer: If there is a user-visible stack addressing, then

typically the stack is in memory and there is a dedicated register that

points to the top of the stack. This allows implicit addressing, that is

push, pop and other stack instructions need not contain an explicit

stack operand.



Condition codes: These are least partially visible register types. They

are also referred to as flags. The bits of Flag register are set according

to the result of an operation. There can be Zero flag, Carry flag, etc., that

indicate whether the result is zero or the result produced an overflow,

respectively.

Example: Consider the length of all registers to be two bits long. The result

of adding two binary numbers 10 and 11 will be 101. That is

1) The result will be stored in a destination register which will be 01.

2) And the carry is set as there is overflow after execution.

Control and Status Registers

These registers are employed to control the operation of the CPU.

Four registers are essential to instruction execution:

Program Counter (PC): It contains the address of an instruction to be

fetched.

Instruction Register (IR): It contains the instruction most recently

fetched.

Memory Address Register (MAR): It contains the address of a location

in memory.

Memory Buffer Register (MBR): It contains a word of data to be written

to memory or the word most recently read.

These four registers are used for the movement of data between the CPU

and memory.

Within the CPU, data must be presented to the ALU for processing. The

ALU may have direct access to the MBR and user-visible registers.

Alternatively, there may be additional buffering registers at the boundary to

the ALU; and these registers serve as input and output registers for the ALU

and exchange data with MBR and user-visible registers. This depends on

the design of the CPU and ALU.



Program Status Word (PSW):

It contains status information. PSW typically contains condition codes plus

other status information. Some of these may be user visible. Common flags

include:

Sign: contains the sign bit of the result of the last arithmetic operation.

Zero: set when the result is 0.

Carry: set if an operation resulted in a carry (addition) into or borrow

(subtraction) out of the high-order bit.

Equal: set if logical compare result is equality.

Overflow: used to indicate arithmetic overflow.

Interrupt enable/disable: used to enable or disable interrupts.

Supervisor: Indicates whether the CPU is executing in supervisor mode

or user mode. Certain privilege instructions can be executed only in

supervisor mode (e.g. halt instruction), and certain areas of memory can

be accessed only in supervisor mode.



4.3 CPU organization

Figure 4.1: CPU with System Bus

As discussed earlier CPU which is the heart of a computer consists of

Registers, Control Unit and Arithmetic Logic Unit. The interconnection of

these units is achieved through the system bus as shown in figure 4.1.

The following tasks are to be performed by the CPU:

1. Fetch instructions: The CPU must read instructions from the memory.

2. Interpret instructions: The instructions must be decoded to determine

what action is required.

3. Fetch data: The execution of an instruction may require reading data

from memory or an I/O module.

4. Process data: The execution of an instruction may require performing

some arithmetic or logical operations on data.

5. Write data: The results of an execution may require writing data to the

memory or an I/O module.



Fundamental Computer Architectures

Here we describe the most common Computer Architectures, all of which

use stored program control concept.

The three most popular computer architectures are:

1) The Stack Machine

2) The Accumulator Machine

3) The Load / Store Machine

The Stack Machine

A stack machine implements a stack with registers. The operands of the

ALU are always the top two registers of the stack and the result from the

ALU is stored in the top register of the stack. Examples of the stack machine

include Hewlett-Packard RPN calculators and the Java Virtual Machine

(JVM).

Figure 4.2: The Stack Machine Architecture



The advantage of a stack machine is that it can shorten the length of

instructions since operands are implicit. This was important when memory

was expensive (20–30 years ago). Now, in Java, it is important since we

want to ship executables (class files) over the network.

The Accumulator Machine

An accumulator machine has a special register, called an accumulator,

whose contents are combined with another operand as input to the ALU,

with the result of the operation replacing the contents of the accumulator.

accumulator = accumulator [op] operand;

Figure 4.3: The Accumulator Machine Architecture

In fact, many machines have more than one accumulator

Pentium: 1, 2, 4 or 6 (depending on how you count)

MC68000: 16



In order to add two numbers in memory,

1) Place one of the numbers into the accumulator (load operand)

2) Execute the add instruction

3) Store the contents of the accumulator back into memory (store operand)

The Load/Store Machine

Registers: Provide faster access but are expensive.

Memory: Provides slower access but is less expensive.

A small amount of high speed memory (expensive), called a register file, is

provided for frequently accessed variables and a much larger but slower

memory (less expensive) is provided for the rest of the program and data.

(SPARC: 32 register at any one time)

This is based on the principle of “locality" – at a given time, a program

typically accesses a small number of variables much more frequently than

others.

Figure 4.4: The Load / Store Machine Architecture



The machine loads and stores the registers from memory. The arithmetic

and logic instructions operate with registers, not main memory, for the

location of operands.

Since the machine addresses only a small number of registers, the

instruction field to refer to a register (operand) is short; therefore, these

machines frequently have instructions with three operands:

ADD src1, src2, dest

EXAMPLE MACHINE INSTRUCTIONS y = y+10

y &y

[y] &y = & y = y

Stack Machine Accumulator Machine Load / Store Machine

Push [y] Load [y] Load r0, [y]

Push 10 Add 10 Load r1, 10

Add Store y’ Add r0, r1, r2

Pop y’ Store r2, y’

CPU organization of 8085 microprocessor

In the following sections we will take a look at the CPU organization in Intel

8085. We will study the register organization and their relationship with other

components in a relatively simple Intel 8085.The Intel 8085 CPU

organization is as shown in figure 4.5.

The CPU of 8085 is organized around a single 8-bit internal bus. Connected

to the Bus are the following components.

1. Accumulator: Used as a temporary buffer to store input to the ALU. It is

also user visible and is addressed in some instructions.



2. Temporary register: Used as input for ALU this is other than

Accumulator.

3. Flags: These are nothing but condition codes. The result of ALU

operations are related back in the register as flags.

Figure 4.5: Intel 8085 CPU organization

The different flag bits supported by the 8085 CPU are:

1. Z: It stands for Zero Flag. This bit is set when the result of ALU

operation is zero. That is result = zero implies Z=1 and result = non zero

implies Z=0.

2. S: It stands for Sign Flag. This bit is set when the result of ALU

operation is negative

That is result = negative implies S=1 and result = non negative implies

S=0



3. P: It stands for Parity Flag. This bit is set when the result of ALU

operation results in even parity. That is the result when expressed in

binary has even number of ones in it.

That is result contains even number of ones implies even parity P=1 and

result contains odd number of ones implies odd parity P=0

4. CY: It stands for Carry Flag. This bit is set when the result of ALU

operation results in overflow.

That is result contains overflow implies carry flag (CY) =1 and result

does not contain overflow implies CY=0

5. AC: It stands for Auxiliary Carry Flag. This bit is set when the result of

ALU operation results in carry between fourth and fifth bit.

That is result contains carry from fourth to fifth bit implies AC = 1 and

result contains carry from fourth to fifth bit implies AC =0

4. ALU output: The result of an operation is placed on the bus.

5. Instruction register: It is loaded from the memory buffer register

(MBR) through the Bus. The instruction is brought into CPU through

the address/data buffer and internal CPU bus to the instruction

register. Then it is decoded here and executed by the control unit.

6. Register Array: It is a collection of registers that are user visible and

help to store the operands or result of the operation.

The register array contains five 16-bit registers. The B – C, D – E, H – L

register, Stack pointer and program register.

1. The first three types of registers can be treated as a single 16 bit register

each or two 8-bit registers each. That is B, C, D, E, H, L are six registers

which are 8-bit long.

2. Stack pointer is a 16-bit that points to the location of stack memory. This

is used for stack oriented machines.

3. Program counter is 16 bit that points to the current location of the

memory where the instructions are to be written. It is moved to the MAR



(address plus address/data buffers) to begin the fetching of the next

instruction.

Any of these registers may be loaded, 8-bits at a time, from the internal bus

or transferred to the address buffer or address/data buffer. Also any of the

16-bit registers can be transferred to the MAR that connects to the external

address bus.

7. Address/data Buffer: This buffer connects to multiplexed bus lines.

Some of the time, this buffer acts as a memory buffer register (MBR) to

exchange data with the system bus. At other times it acts as the low

order 8-bits of a memory address register (MAR). This multiplexing

allows additional pins that are available for control signals to the system

bus.

8. Address buffer: Used as a high order 8 bits of the MAR.

Let us consider an instruction ADD B as an example to study the sequence

of actions that takes place.

This instruction causes the contents of register B to be added to the

contents of the accumulator.

To execute this instruction, the control unit moves the contents of

register B to the temporary register.

The ALU performs the add operation on its inputs.

The result is placed on the internal CPU bus and copied into the

accumulator.

The ALU also updates the flags to reflect the results of add.

4.4 Register Organization of different machines

It is very much instructive to examine and compare the register organization

of comparable systems. In this section we will discuss the register

organization of 16-bit microprocessors that were developed more or less at

the same time.



The Zilog Z8000

Figure 4.6 depicts the register organization of Z8000 machine. Here only

purely internal registers structure is given and memory address registers are

not shown.

Figure 4.6: Zilog Z8000 register organization

Z8000 consists of sixteen 16-bit general purpose registers, which can be

used for data, address and indexing. The designers of this machines felt

that it was useful to provide a regularized, general set of registers than to

save instruction bits by using special purpose registers. Further the way the

functions are assigned to these registers is the responsibility of the

programmer. There might be different functional breakdown for different

applications.



A segmented address space uses 7-bit segment number and a 16-bit offset.

It uses two registers to hold a single address. There are two other registers

called stack pointers that are necessary for stack module. One register used

for system mode and one for normal mode.

Z8000 consists of five other registers that are related to program status.

Two registers hold the program counter and two registers hold the address

of a program status area in the memory. A 16-bit flag register called Flag

control word holds various flags status and control bits.

Intel 8086

The approach taken by Intel for register organization is different than Z8000

machine and is as shown in Figure 4.7.

Figure 4.7: Intel 8086 register organization

In this machine every register is a special purpose register. There are some

registers that also serve as general purpose registers. The 8086 machine

contains four 16-bit data registers that are accessible on a byte or 16-bit

basis. It also has four 16-bit pointers and index registers. The data registers



can be used as general purpose registers in some instructions. In others

registers are used implicitly.

Example: A multiply instruction always uses accumulator. The four pointer

registers each with segment offset are also used in a number of operations

implicitly. There are also four segment registers. Three of these segment

registers are used in a dedicated, implicit way to point to the segment of the

current instruction, a segment containing data and a segment containing a

stack. This type of structure is useful in branch operations. The dedicated

and implicit uses provide for compact encoding at the cost of reduced

flexibility. The 8086 also includes an instruction pointer and a set of 1 bit

status and control flags.

The Motorola MC68000

DATA REGISTERS ADDRESS REGISTERS

D0 A0

D1 A1

D2 A2

D3 A3

D4 A4

D5 A5

D6 A6

D7 A7

PROGRAM STATUS

Program counter

Status register

Figure 4.8: MC68000 register organization

This machine uses a structure that falls between the Zilog Z8000 and Intel

8086. The register organization of MC68000 is as shown in figure 4.8



The MC6800 machine partitions the 32-bit registers into eight data registers

and nine address registers. The data registers are used in data

manipulations and are also used in addressing only as index registers. The

width of data register allows 8-bit, 16-bit and 32-bit data operations

depending upon the opcode.

The address registers contain a 32-bit address. It does not support

segmentation. Two of the address registers are used as stack pointers: One

for the users and one for the operating system, depending upon the current

execution mode. Both the stack pointers are numbered as seven i.e., A7 as

only one can be used at a time. The MC68000 also has program counter

and status register as in the other two machines. The program counter is

32-bit register and status register is 16-bit. Like Zilog the Motorola team also

supports a regular instruction set with no special purpose registers. For the

purpose of code efficiency they divided the registers into functional

components, saving one bit at each registers.

4.5 Instruction Cycles

The basic function performed by a computer is program execution. The

program to be executed consists of a set of instructions stored in memory.

The CPU does the actual work by executing the instructions written in the

program.

Basic Instruction cycle

To understand the ways the major components of a computer or rather CPU

interact to execute a program, consider the flow chart as shown in figure 4.9.



START

HALT

Fetch the NEXT

Instruction

Execute

Instruction

Fetch cycle

Execute cycle

Figure 4.9: Basic instruction cycle

The processing required for a single instruction is called an instruction

cycle. Referring to the above figure 4.9, the basic instruction cycle can be

assumed to consist of two steps:

1. The CPU reads or fetches instructions from memory one at a time.

2. Then this single instruction is executed.

The program execution consists of repeating these two steps of instruction

fetch and instruction execution until the end of the program.

Thus the instruction cycles consists of fetch cycle and execute cycle.

The execution of program itself may involve a number of steps. The

instruction fetch is a common operation for every instruction, and consists of

reading an instruction from a location in memory. The instruction may

involve several operations depending upon the nature of the instructions. In

any CPU we have seen that there is a register called program counter that

keeps track of the instructions to be fetched next from the memory. CPU

always increments the PC after each instruction fetch so that it will fetch the

next instructions in sequence.



The fetched instruction is stored in a register in the CPU known as

instruction register. The instruction is in the form of binary code that

specifies what action has to be taken by the CPU. The CPU interprets the

instruction and performs the required action.

Example: If PC = [300] that is PC is set to 300 location of memory. If there

are three instructions in a program and next subsequent instructions are

placed say 303, 304 and end or Halt at 306. Then discuss the steps

involved in execution of this program.

Solution:

1. First of all the instruction that is present at location 300 in memory is

loaded into IR .At the same time PC is incremented by 3 as the next

instruction is at 303.

2. Depending on the instruction that is decoded, it finds that operands are

to be read from memory (as the next instruction is at 303 implies that

the operands of first instruction may be placed at 301 and 302)

3. Then the instruction is executed.

4. Then again reads the instruction from 303 into IR and update PC=304.

5. Decodes the instruction and Executes. (Now operands are definitely

not specified in memory as the next instruction is at 304.)

6. The third instruction is read into IR from location 304 as PC=304 and

now PC updates with 306.

7. Decodes the instruction and may find operand in memory location

hence gets operands from memory at 305.

8. and then executes the instruction

9. Finally now the next instruction is at 306 so it is read in IR and PC=307

10. Decodes this instruction which is halt

11. Executes it which terminates the execution of the program.



The following is one view of the instruction cycles:

Fetch the instruction:

Place it in a holding area called IR.

Update the program counter (PC).

Execute the instruction:

Decode the instruction:

o Perform address translation.

o Fetch operands from memory.

Perform the required calculation.

Store results in registers or memory.

Set status flags attached to the CPU.

The required action performed by the CPU falls into four categories.

1. CPU-Memory: Data may be transferred from the CPU to memory or

from memory to CPU

2. CPU-I/O: Data may be transferred to/from the outside world by

transferring between CPU and I/O module.

3. Data processing: The CPU may perform some arithmetic or logic

operations on data.

4. Control: An instruction may specify the sequence of execution be

altered. For example, jump instruction.

Consider for example that the program is written from location 300 to 320 in

the memory. Now if the instruction present at say 305 is something like jump

to 310. When the program is executed the instruction located at 300 to 305

are executed in sequence one at a time. Then after the execution of

instruction jump to 310, the CPU has updated the PC=310. Hence the next

instruction that is fetched will be from 310 and so on.



Basic Instruction Cycle State diagram

These are the detailed states occurring during the fetch and execution

cycles are as shown in figure 4.10. A few of them occur only once and a few

occur multiple times.

Figure 4.10: Instruction Cycle State Diagram

The different possible states occurring in Instruction cycle are:

Instruction Address Calculation (IAC): Determine the address of the

next instruction to be executed. Usually, this involves adding a fixed

number to the address of the previous instruction. For example, if each

instruction is 16 bits long and memory is organized into 16-bits words,

then add 1 to the previous address. If, instead memory is organized as

individually addressable 8-bit bytes, then add 2 to the previous address.

Instruction Fetch (IF): Read instruction from its memory location into

the processor.

Instruction Operation Decoding (IOD): Analyze instruction to

determine type of operation to be performed and operand(s) to be used.

Operand Address Calculation (OAC): If the operation involves

reference to an operand in memory or available via I/O, then determine

the address of the operand.



Operand Fetch (OF): Fetch the operand from memory or read it in, from

I/O.

Data Operation (DO): Perform the operation indicated in the instruction.

Operand Store (OS): Write the result into memory or out to I/O.

During the instruction cycles, Instruction address calculation, Instruction

fetch and Instruction operation decoding are performed only once, but

operand address calculation and operand fetch may happen multiple times.

Data operation is performed once, and Operand store with the combination

of Operand address calculation may be performed zero or more times

depending on the instruction.

4.6 Summary

We know that CPU consists of a set of registers that function as a level of

memory. Here we have discussed the different types of registers like user

visible registers, control registers or flag registers and so on. We have also

introduced the basic instruction cycle so as to get a feel of how exactly

these registers are used in the execution of the instruction. As a case study

we have seen the detail register organization of few machines like Zilog

Z8000, Intel 8086 and Motorola 68000 microprocessors.


1. CPU consists of a set of registers that function as a level of memory

above ______________.

2. ______________ enable the machine or assembly language

programmer to minimize main memory references by use of registers.

3. Flags of 8085 are nothing but ______________.

4. ______________ number of flag bits is defined in 8085.

5. The processing required for a single instruction is called an

______________ .




1. List and explain the various condition flags of the Intel 8085

microprocessor.

2. Explain the register organization of 8086.

3. Compare the register organization of 8085, Z8000 and MC68000.

4.8 Answers


1. main memory and cache memory

2. user visible registers

3. conditional codes

4. five

5. instruction cycle

Terminal Questions:






Unit 5 Interconnection Structures

Structure:

5.1 Introduction

Objectives

5.2 Types of exchange of information

Modules of a System

Different types of transfers

5.3 Types of Buses

5.4 Elements of Bus Design

Bus Types

Method of arbitration

Bus Timing

Bus width

Bus Speed

5.5 Bus Structure

Single Bus System

Two Bus Organization

The Bus Standard

5.6 Summary


5.8 Answers

5.1 Introduction

Different types of exchanges are required for communication in a computer.

Figure 5.1 shows the Memory Module, I/O Module and CPU Module and

also indicates the major forms of inputs and outputs of these modules.



Objectives:


1. Discuss the different means of data transfer.

2. Define BUS, and explain its structure.

3. Explain the elements of Bus design.

4. Explain with neat sketches single and two bus structure.

5.2 Types of exchange of information

A computer consists of three basic types of modules (processor, memory,

I/O) that communicate with each other. Thus, there must be paths for

connecting the modules. The collection of paths is called the

"interconnection structure". The design of this structure will depend on

the exchanges that must be made between modules.

Modules of a system

Figure 5.1: a) Memory module

Memory module consists of N words of equal length. Each word is assigned

a unique numerical value (0, 1, . . ., N-1). Figure 5.1(a) shows the possible

inputs and output of a memory module. A word of data can be read from or

written into the memory depending on whether the control signal is Memory



Read (MR) or Memory Write (MW). The location of the memory is provided

by the input called as Address.

Example: List the inputs and outputs that are necessary to write the data to

the memory.

Solution:

The inputs that are required are

1. Control signal which is memory write,

2. Address that gives the location of the memory where the data is to be

written and the data itself.

There is no output required for this particular task.

The sequence of operations that take place:

1. The position of the memory is located using the input address.

2. Then it sees the input control signal is MW and then the input which is

data is placed in the location of the memory.

I/O Module

I/O is functionally similar to memory. The possible inputs and outputs for a

typical I/O module is as shown in figure 5.1(b).

Figure 5.1: b) I/O Module



There are two operations read and write. I/O module may control more than

one external device. We usually refer to each of the interface of the external

device as a port. Each port is given a unique address as 0, 1, ………, M-1.

Also there are external data paths for the input and output of data with an

external device. Also an I/O module may be able to send interrupt signals to

the CPU.

CPU Module

The possible inputs and outputs for a typical CPU module is as shown in

figure 5.1(c). The CPU reads in the instructions and data and writes out the

data after processing. It uses control signals to control the overall operation

of the system. It also receives the interrupt signals and appropriate actions

to be taken in outputs data as well as control signals.

Figure 5.1: c) CPU Module

Different types of transfers

There are two types of transfers that are classified; one depending on the

devices that are to be interconnected and the other depending on whether it

carries information one bit or several bits.

Types of transfers between different modules

Following are the possible ways of transfer of information between the three

modules of the system:



Memory to Processor (CPU): The processor reads an instruction or a

unit of data from memory.

CPU to Memory: The processor writes a unit of data to memory.

I/O to CPU: The processor reads data from an I/O device via an I/O

module.

CPU to I/O: The processor sends data to the I/O device.

I/O to or from memory: For these two cases, an I/O module is allowed

to exchange data directly with memory, without going through the

processor, using Direct Memory Access (DMA).

Serial & Parallel transfer

A bus consists of multiple communication lines (or pathways). Each line is

capable of transmitting signals representing binary 1 or binary 0. Bits are

transmitted in Serial & Parallel. When only one bit is carried at a time it

requires only a single wire as shown in figure 5.2. When we consider many

bits to be transmitted at a time it requires many wires as shown in figure 5.3.

These many wires together constitute a bus and the mode of transmission is

termed as parallel transmission.

One device Other Device

One bit

Figure 5.2: Serial transmission - one bit at a time

7D

0D

Transmitter

Receiver

Clock

Figure 5.3: Parallel transmission - several (8 bits here) bits at a time



Table gives the Comparison of serial & parallel transmission

Factor Serial mode Parallel mode

Cost Less costly (only one wire) More costly (many wires)

Speed Low ( only 1 bit at a time) High (more bits at a time)

Throughput Low High

5.3 Types of Buses

As discussed earlier in unit 1, a system bus consists of a data bus, a

memory address bus and a control bus. The interconnection between the

modules of a system is as shown in figure 5.4

Figure 5.4: Bus Interconnection scheme

Data Bus: A bus, which carries a word to or from memory, is called data

bus. Its width is equal to the word length of the memory. Also, it provides a

means for moving data between the different modules of a system. The data

bus usually consists of 8, 16 or 32 separate lines. The number of lines

implies the data bus

Address bus: A bus that is used to carry the address of the data in the

memory and its width is equal to the number of bits in the Memory Address

Register (MAR) of the memory.

Example: If a computer memory has 64K, 32-bit words, then the data bus

will be 32-bits wide and the address bus will be 16-bits wide.



Control bus: A bus that is used to control the access carries the control

signals between the various units of the computer. The processor has to

send commands READ and WRITE to the memory which requires single

wire. A START command is necessary for the I/O units. All these signals are

carried by the control bus.

Types of Control Lines

Memory Write: Causes data on the bus (data bus) to be written into the

addressed location.

Memory Read: Causes data from the addressed location to be placed

on the bus (data bus).

I/O Write: Causes data on the data bus to be output to the addressed

I/O port.

I/O Read: Causes data from the addressed I/O port to be placed on the

bus (data bus).

Transfer ACK: Indicates that data has been accepted from or placed on

the bus.

Bus Request: Indicates that a module needs to gain control of the bus.

Bus Grant: Indicates that a requesting module has been granted control

of the bus.

Interrupt Request: Indicates that an interrupt is pending.

Interrupt ACK: Acknowledges that the pending interrupt has been

recognized.

Clock: Used to synchronize operations.

Reset: Initializes all modules.



5.4 Elements of Bus Design

Bus Types: Bus lines can be separated into two types.

Dedicated: Permanently assigned to either one function or to a

physical subset of components.

Functional dedication: Bus has a specific function.

Example: Three busses identified for carrying address, data, and

control signals as seen earlier. They are Address Bus, Data Bus,

and Control Bus.

Physical dedication: Refers to the use of multiple buses, each of

which connects only a subset of components using the bus.

Example: I/O buses are used only to interconnect all I/O modules.

And this bus is then connected to the main bus through some type of

an I/O adapter module.

Advantage of Physical dedication:

It offers high throughput because there is less bus contention.

Disadvantage of Physical dedication:

Increased size and cost of the system

Multiplexed: These are also referred to as non-dedicated. Same bus

may be used for various functions. The method of using the same bus

for multiple purposes is known as Time Multiplexing.

Example: Discuss the steps of actions that are to be performed so that

address and data information may be transmitted over the same set of lines.

Solution: Assume an additional control signal called address line activation

line or Address Line Enable (ALE) line is used.

List of operations are as follows:

1. At the beginning of the data transfer the address is placed on the bus

with the ALE line activated.



2. Each module is given sufficient period of time to copy the address and

determine if it is the addressed module.

3. The address is then removed from the bus, and then same bus

connections are used for subsequent read and write data transfer with

the ALE signal deactivated.

Advantages: Multiplexing uses fewer lines, which saves space and cost.

Disadvantages of multiplexing:

1. More complex circuitry required in each module.

2. There is potential reduction in the performance as certain events that

share the same bus cannot take place in parallel.

Method of Arbitration:

Centralized: A single hardware device (referred to as bus controller or

arbiter) is responsible for allocating time on the bus to various

components.

Distributed: In a distributed scheme, there is no central controller.

Each module contains access control logic and the modules act together

to share the bus.

In both methods, the purpose is to designate one device (either the

processor or an I/O module) as master. The master may then initiate a data

transfer with some other device which acts as slave for this particular

exchange.

Bus Timing

Synchronous: The occurrence of events on the bus is determined by a

clock. The bus includes a clock line. A single 1-0 transmission on clock

signal is referred to as 1 clock cycle" or "bus cycle", and defines a time

slot. Most events occupy a single clock cycle, but some requires more

cycles.



Asynchronous: The occurrence of one event on a bus follows and

depends on the occurrence of a previous event.

Synchronous timing is simpler to implement and test; however it is less

flexible.

With asynchronous timing, a mixture of slow and fast devices can share

a bus.

Bus Width:

Bus Width of Address Lines: Number of memory units that can be

addressed.

Bus Width of Data Lines: Size of memory units that can be addressed.

(8, 12, 16, 32, 64 bits)

Ask and give examples: How many memory addresses with 24 bits,

32 bits, etc.

Bus Speed:

One of the important attribute of busses is its speed. The speed of the bus

refers to how fast you can change the data on the bus, and still have

devices to be able to read the values correctly. Bus speed can limit how

fast a CPU can communicate with memory. The size of a bus can also limit

the speed too.

Example: The speed can be measured in say, MHz that is up to 106

changes per second.

Data Transfer Type:

Read: (Slave to Master)

Write: (Master to Slave)

Read-Modify-Write: A read followed immediately by a write to the same

address. Usually indivisible operation to prevent any access to data by

other potential bus masters.



Read-After-Write: Indivisible operation consisting of a write followed

immediately by a read of the same address. Generally for checking

purposes.

Block: In this case, one address cycle is followed by n data cycles. The

first data item is transferred to/from the specified address, the remaining

data items are transferred to/from subsequent addresses.

5.5 Bus Structure

If a large number of devices are connected to the bus, the performance will

suffer. There are two main causes:

1. In general, the more devices attached to the bus, the greater is the bus

length, and hence the greater is the propagation delay.

2. The bus may become a bottleneck as the aggregate data transfer

demand approaches the capacity of the bus.

To overcome these problems, in most of the modern computers there

are multiple buses.

Single Bus System

In this type of inter-connection, the three units share a single bus. Hence the

information can be transferred only between two units at a time. Here the

I/O units use the same memory address space. This simplifies programming

of I/O units as no special I/O instructions are needed. This is one of

advantages of single bus organization.



I/O

Units

Processor

Memory

Figure 5.5: Single-Bus Organization

The transfer of information over a bus cannot be done at a speed

comparable to the operating speed of all the devices connected to the bus.

Some electromechanical devices such as keyboards and printers are very

slow whereas disks and tapes are considerably faster. Main memory and

processors operate at electronic speeds. Since all the devices must

communicate over the bus, it is necessary to smooth out the differences in

timings among all the devices.

A common approach is to include buffer register with the devices to hold

the information during transfers. To illustrate this let us take one example.

Consider the transfer of an encoded character from the processor to a

character printer where it is to be printed. The processor sends the

character to the printer output register over the bus. Since buffer is an

electronic register this transfer requires relatively little time. Now the printer

starts printing. At this time bus and the processor are no longer needed and

can be released for other activities. Buffer register is not available for other

transfers until the process is completed. Thus buffer register smoothes out

the timing differences between the processor, memory and I/O devices. This

allows the processor to switch rapidly from one device to another

interweaving its processing activity with data transfer involving several I/O

devices.



Two Bus Organization

Figure 5.6 shows inter-connection of various computer units through two

independent system buses. Here the I/O units are connected to the

processor through an I/O bus and the processor is connected to the memory

through the memory bus.

DAB MAB

CB CB

Figure 5.6: Two Bus Organization

The I/O bus consists of device address bus, data bus and a control bus.

Device address bus carries the address of the I/O units to be accessed by

the processor. The data bus carries a word from the addressed input unit to

the processor and from the processor to the addressed output unit. The

control bus carries control commands such as START, STOP etc., from the

processor to I/O units and it also carries status information of I/O units to the

processor. Memory bus also consists of a Memory Address Bus (MAB),

data bus and a control bus.

In this type of inter-connection the processor completely supervises the

transfer of information to and from the I/O units. All the information is first

taken to the processor and from there to the memory. Such a data transfer

is called as program controlled transfer.

An alternative two bus structure:

There is one more method to connect processor to the I/O units. Figure 5.7

shows an alternative two bus structure. Here the I/O units are directly

connected to the memory.

I/O Units Processor Memory



I/O devices

Processor

Memory

Peripheral

Processing Unit

Figure 5.7: Alternative Two-Bus Organization

Here the I/O devices are connected to special interface logic known as

Direct Memory Access (DMA) logic or an I/O channel. This is also called

as Peripheral Processor Unit (PPU). The processor issues a READ or

WRITE command giving the device address, the address of the memory

location where the data read from the input unit is to be stored or from

where the data is to be taken to output units, and the number of data words

to be transferred. This command is accepted by the PPU, which now takes

the responsibility of data transfer.

The Bus Standard

In 1984 IBM was shipping its PC AT model. The CPU, memory and I/O bus

all shared a common 8MHz clock. This became the basis for all subsequent

clone computers. The term “AT” is a registered trademark of IBM, so this I/O

bus became known as the ISA (Industry Standard Architecture) bus.



Every currently marketed PC supports some ISA interface slots. The bus

and matching adapter cards are simple and cheap. ISA is a 16-bit interface,

which means that data can be transferred only two bytes at a time. More

importantly, the ISA bus runs at only 8 MHz and it typically requires two or

three clock ticks to transfer those two bytes of data. This is not a problem for

devices that are inherently slow like the COM port (modem), the printer port,

the sound card or the CD-ROM. However, the ISA bus is too slow for high

performance disk access and therefore is not acceptable in Servers. It is

also too slow for modern Windows display adapters.

The PCI (Peripheral Connection Interface) bus was developed by Intel.

Although it is mostly known for its CPUs, Intel also has a historical

association with Ethernet, multimedia and some disk interfaces. So Intel

was unhappy with the VLB concentration on just the video interface and

wanted to develop a general purpose bus. The objective was an interface

that was fast and inexpensive. It did not have to be simple (advances in chip

technology took care of that) and could achieve a low cost by high volume

production.

PCI is a 64-bit interface in a 32-bit package. Figuring this out requires a bit

of arithmetic. The PCI bus runs at 33 MHz and can transfer 32-bits of data

(four bytes) every clock tick. That sounds like a 32-bit bus. However, a clock

tick at 33 MHz is 30 nanoseconds and memory only has a speed of 70

nanoseconds. When the CPU fetches data from RAM, it has to wait at least

three clock ticks for the data. By transferring data every clock tick, the PCI

bus can deliver the same throughout on a 32-bit interface that other parts of

the machine deliver through a 64-bit path.

The PCI bus connects at one end to the CPU/memory bus and at the other

end to a more traditional I/O bus. The PCI interface chip may support the

video adapter, the EIDE disk controller chip and may be two external



adapter cards. A desktop machine will have only one PCI chip and so it will

add a number of extra ISA only slots. A server may add additional PCI chips

and extra server slots will usually be EISA.

While ISA and EISA are exclusively PC interfaces, the PCI bus is now used

in Power Macintosh systems and PowerPC machines. It may be attractive

for minicomputers and other RISC workstations.

5.6 Summary

There must be paths for connecting the processor, Memory or I/O modules.

The collection of paths is called the interconnection structure or the Bus

structure. We have introduced different types of transfers between different

modules like Memory to processor (CPU), CPU to memory, or I/O to CPU.

There are different buses called address, data or control bus depending

upon information they carry, that is, address, data, or controlling signals

respectively. These buses can be dedicated to one type of information they

carry or can be multiplexed; they can be centralized or distributed, or

synchronous or asynchronous.


1. The collection of paths for connecting the modules is called the

____________.

2. The location of the memory is provided by the input called as

____________.

3. A bus which carries a word to or from memory is called ____________.

4. A bus that is used to carry control signals is ____________.

5. The method of using the same bus for multiple purposes is known as

____________.




1. Discuss the different types of Bus.

2. Give the advantages and disadvantages of physical and functional

buses.

3. Explain the Single bus structure.

4. Discuss the relative merits and de-merits of single, two and three bus

structure.

5. Discuss the methods of arbitration.

5.8 Answers


1. interconnection structure

2. address

3. data bus

4. control bus

5. time multiplexing

Terminal Questions:








Unit 6 Instruction Sets:

Addressing Modes and Formats

Structure:

6.1 Introduction

Objectives

Instruction Characteristics

Instruction representation

Instruction types

Number of addresses

Instruction Set Design

6.2 Types of Operands

Data types

IBM 370 Data types

VAX Data types

6.3 Types of Operations

Data transfer

Arithmetic

Logical

Conversion

I/O, system control

Transfer of control

System Control

6.4 Addressing Modes

Direct addressing mode

Immediate addressing mode

Indirect addressing mode

Register addressing mode

Register indirect addressing mode

Displacement addressing mode

Relative addressing mode



Base Register addressing Mode

Indexing

Stack addressing

Other additional addressing modes

6.5 Instruction formats

Instruction Length

Allocation of bits

Variable length instruction

6.6 Stacks & Subroutines

Stacks

Subroutines

6.7 Summary


6.9 Answers

6.1 Introduction

The operation of a CPU is determined by the instructions it executes. These

instructions are referred to as machine instructions.

The complete collection of instructions that are executed by a CPU is

referred to as CPU’s Instruction Set. The instructions are usually

represented in binary and the program is usually written in assembly

language.

Objectives:


1. Discuss the elements of a machine instruction.

2. Discuss the data types supported by IBM and VAX machines.

3. Discuss different addressing modes.

4. Explain with example different types of operations.



Instruction Characteristics

Elements of a Machine Instruction: Each instruction must contain the

information required by the CPU for execution. Elements of a machine

Instruction are:

Operation Code: Specifies the operation to be performed. The

operation is specified by a binary code, known as the operation code,

or opcode.

Source Operand Reference: The operation may involve one or more

source operands; that is, operands that are the inputs for the operation.

Result Operand Reference: The operation may produce a result.

Next Instruction Reference: This tells the CPU where to fetch the next

instruction after the execution of the current instruction is complete.

The next instruction to be fetched is located in main memory or in case of

virtual memory system it can be either in main memory or secondary

memory. In most cases, the next instruction to be fetched immediately

follows the current instruction. In those cases, there is no explicit reference

to the next instruction. But when explicit reference is needed, then the main

memory or virtual memory address must be supplied. The form in which the

address is supplied is discussed in this section.

Source and Result operands can be in one of the three areas:

Main (or virtual) memory: The memory address must be supplied.

CPU register: CPU contains one or more registers that may be

referenced by machine instructions. If only one register exists, reference

to it may be implicit. If more than one register exists, then each register

is assigned a unique number, and the instruction must contain the

number of the desired register.



I/O device: The instruction must specify the I/O module or device for the

operation. If memory-mapped I/O is used, this is just another memory

address.

Instruction Representation

Each instruction is represented by a sequence of bits. The instruction is

divided into fields, corresponding to the constituent elements of the

instruction. This layout of the instruction is called Instruction Format. With

most instruction sets, more than one format is used. It is difficult for a

programmer and the reader to deal with binary representations of machine

instructions. Hence usually the instructions are written in symbolic

representations of machine code using English like language called

mnemonics.

Thus operation codes, in short Opcodes are represented using mnemonics.

The format for this instruction is given in figure 6.1.

Opcode Operand Reference Operand Reference

Figure 6.1: A simple instruction format

In most modern CPU's, the first byte contains the opcode, sometimes

including the register reference in some of the instructions. The operand

references are in the following bytes (byte 2, 3, etc...).

Examples of few mnemonics:

Table 6.1: Examples of mnemonics

Mnemonics Description

ADD add

SUB subtract

MUL Multiply

LOAD Load data from memory

MOV A, B Move contents of B register to A register, where A & B are user visible registers of CPU



Example 1: An instruction with a register and memory address operands.

8 bits 8 bits 16/32 bits

Opcode Operand Ref. Operand Reference

register reference memory address

Example 2: An instruction may span to the second byte. Eg: 2 byte

instruction can be as follows:

12 bits 4 bits

Opcode Operand Ref.

register reference

For example consider an instruction,

ADD R, Y

Assuming that it means that add the contents of data location Y of the

memory to the contents of register R. The operation is performed on the

data present in location Y and not on its address i.e., Y itself.

Previous to the execution of add instruction we give a list of specifications

like X= 153, Y=154 and R=20. And the content of memory at location 153 is

10 and at 154 it is 20.

After the execution of the given ADD R, Y the result will be in R and will be

equal to {R=20+[Y=154]=20 }=40.

Instruction Types

Example: High level language statement: X = X + Y

If we assume a simple set of machine instructions, this operation could be

accomplished with three instructions: (assume X is stored in memory

location 624, and Y in memory loc. 625.)

1. Load a register with the contents of memory location 624.



2. Add the contents of memory location 625 to the register,

3. Store the contents of the register in memory location 624.

As seen, a simple "C" (or BASIC) instruction may require 3 machine

instructions.

As we have seen before, the instructions fall into one the following four

categories:

Data processing: Arithmetic and logic instructions.

Data storage: Memory instructions.

Data movement: I/O instructions.

Control: Test and branch instructions.

Number of Addresses

What is the maximum number of addresses one might need in an

instruction?

Virtually all arithmetic and logic operations are either unary (one operand) or

binary (two operands). The result of an operation must be stored,

suggesting a third address. Finally, after the completion of an instruction,

the next instruction must be fetched, and its address is needed.

This line of reasoning suggests that an instruction could be required to

contain 4 address references: two operands, one result, and the address of

the next instruction. In practice, the address of the next instruction is

handled by the Program Counter (PC); therefore most instructions have

one, two or three operand addresses. Three-address instruction formats are

not common, because they require a relatively long instruction format to

hold three address references.



Table 6.2: Utilization of Instruction Addresses (Non-branching Instructions):

Number of Addresses

Symbolic Representation

Interpretation

3 OP A, B, C A <- B OP C

2 OP A, B A <- A OP B

1 OP A AC <- AC OP A, (AC is Aaccumulator)

0 OP T <- T OP (T-1)

Example 3: Program to execute Y = (A – B) / (C + D x E):

Solution:

A) With One-Address Instructions: (requires an accumulator AC)

Table 6.3: Solution to ex.3

LOAD D AC <- D

MUL E AC <- AC x E

ADD C AC <- AC + C

STOR Y Y <- AC

LOAD A AC <- A

SUB B AC <- AC – B

DIV Y AC <- AC / Y

STOR Y Y <- AC

B) With Two-Address Instructions:


MOVE Y, A Y <- A

SUB Y, B Y <- Y – B

MOVE T, D T <- D

MUL T, E T <- T x E

ADD T, C T <- T + C

DIV Y, T Y <- Y / T



C) With Three-Address Instructions:


SUB Y, A, D Y <- A – B

MUL T, D, E T <- D x E

ADD T, T, C T <- T + C

DIV Y, Y, T Y <- Y / T

Instruction Set Design

The design of an instruction set is very complex, since it affects many

different aspects of the computer system. The instruction set defines many

of the functions performed by CPU. The instruction set is a programmer’s

means of implementation of the CPU.

The fundamental design issues in designing an instruction set are:

Operation Repertoire: How many and which operations to provide, and

how complex operations should be?

Data Types: The various types of data upon which operations are

performed.

Instruction Format: Instruction length (in bits), number of addresses,

sizes of various fields, and so on.

Registers: Number of CPU registers that can be referenced by

instructions and their use.

Addressing: The mode or modes by which the address of an operand is

specified.

These issues are highly interrelated and must be considered together in

designing an instruction set.



6.2 Types of Operands

Data Types

The most important general categories of data are:

Addresses

Numbers

Characters

Logical data.

Numbers

All machine language include numeric data types. Even in non numeric data

processing, there is a need of numbers to act as counters, field widths, and

so on. The numbers stored in computer are limited. That is there is a limit to

the magnitude of numbers that can be represented on a machine. Also the

programmer is faced with understanding the consequences of rounding,

overflow, and underflow.

Three types of numerical data are commonly used in computers. They are:

1. Integer or fixed point

2. Floating point (real numbers)

3. Decimal (Remember BCD; an instruction set may be able to process

BCD numbers.)

1. Integer data type:

For unsigned integers: It is a straight forward, simple binary representation.

In an N-bit word the N-bits holds the magnitude of the numbers.

For example

00000000 = 0

00000001 = 1

00101001 = 41

10000000 = 128

11111111 = 255



For signed integers: It involves treating the Most Significant Bit (MSB)

which is nothing but leftmost bit in a word as a sign bit. That is if the MSB is

1 the number is considered as negative else positive. In an N-bit word, the

(N-1) bits holds the magnitude of the numbers.

For example

00000000 = 0

00000001 = 1

00010010 = 18

10010010 = -18

11111111 = -127

2. Floating point numbers: These numbers are represented as EBS *

Where Sign: plus or minus

S is significant

B is base for binary it is 2

E is exponent

0 1 8 9 31

Sign bit

Biased Exponent

Significant

Figure 6.2: Format of floating point number

3. Decimal: Usual decimal system.

Characters

International Reference Alphabet (IRA) is referred to as ASCII in the USA.

Each character in this code is represented by a unique 7-bit pattern, thus

128 different characters can be represented. Some of the patterns represent

control characters. ASCII – encoded characters are usually stored and

transferred as 8-bits per character. The eight bit may be set to 1 or 0 for

even parity. EBCDIC character set is used in IBM 370 machines. It is an

8-bit code.



Logical Data

With logical data, memory can be used most efficiently for this storage.

Logical values are also called as Boolean values which are 1 = true,

0 = false.

IBM 370 Data types

The IBM S/370 architecture provides the following data types.

Binary integer: Binary integer may be either unsigned or signed.

Signed binary integers are stored in 2’s compliment form. Allowable

lengths are 16 and 32 bits.

Floating point: Floating point numbers of length 32, 64, and 128 bits

are allowed. All use 7-bit exponent field.

Decimal: Arithmetic on packed decimal integers is provided. The length

is from 1 to 16 bytes. The rightmost 4 bits of the rightmost byte hold the

sign. Hence signed numbers from 1 to 31 decimal digits can be

represented.

Binary logical: Operations are defined for data units of length 8, 32,

and 64 bits. And variable length logical data of up to 256 bytes.

Character: EBCDIC is used.

VAX Data types

The VAX provides an impressive array of data types. It is a byte oriented

machine. All data types are in terms of bytes including 16-bit word, 32-bit

long word, and the 64-bit quad-word, and even the 128 bit octa-word. The

data type of VAX machine provides the following five types of data.

Binary integer: Binary integers are usually considered as in 2’s

complement form. However they can be considered and operated on

unsigned integers. Allowable lengths are 8, 16, 32, 64, and 128 bits.

Floating point: It provides four different types of representations. They

are



F: 32 bits with an 8-bit exponent

D: 64 bits with an 8-bit exponent

G: 64 bits with an 11-bit exponent

H: 128 bits with an 15-bit exponent

The F type is normal or default representation. D is usual double precision

representation. G and H are provided for variety of applications to give

successively increasing range and precision over F.

Decimal: Arithmetic on packed decimal integers is provided. Two

formats are provided.

Packed decimal strings: The length is from 1 to 16 bytes with 4 bits holding

the sign.

Unpacked numeric strings: Stores 1 digit in ASCII representation, per byte

with up to 31 bytes of length.

Variable bit field: These are small integers packed together in large

data unit. A bit field is specified by three operands: address of byte

containing the start of field, starting bit position within the byte, the

length in bits of the field. This data type is used to increase memory

efficiency.

Character: ASCII is used.

6.3 Types of Operations

A set of general types of operations is as follows:

Data transfer

It is the most fundamental type of machine instruction. The data transfer

must specify

1) The location of the source and destination. Each location can be

memory, register or top of stack.



2) The length of the data to be transferred.

3) The mode of addressing for each operand.

If both the operands are CPU registers, then the CPU simply causes data to

be transferred from one register to another. This operation is internal to the

CPU. If one or both operands are in memory, then the CPU must perform

some or all of the following actions:

1. Calculate the memory address, based on the address mode.

2. If the address refers to virtual memory, translate from virtual to actual

memory address.

3. Determine whether addressed item is in cache.

4. If not issue command to memory module.

For example: Move, Store, Load (Fetch), Exchange, Clear (Reset), Set,

Push, Pop

Arithmetic

Most machines provide basic arithmetic functions like Add, Subtract,

Multiply, and Divide. They are invariably provided for signed integer

numbers. Often they are also provided for floating point and packed decimal

numbers. Also some operations include only a single operand like Absolute,

that takes only absolute value of the operand, Negate that takes the

complement of the operands, Increment that increments the value of

operand by 1, Decrement that decrements the value of operand by 1.

Logical

Machines also provide a variety of operations for manipulating individual bits

of a word often referred to as bit twiddling. They are based on Boolean

operations like AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set

control variables.



Conversion

These instructions are the ones that change the format or operate on the

format of the data. Examples are Translate, Convert.

I/O, System control

There is a variety of approaches like isolated programmed I/O, memory

mapped I/O, DMA and so on. Examples: Output (Write), Start I/O, Test I/O.

Transfer of Control

The instruction specifies the operation. In the normal course of events the

next instruction to be performed is the one that immediately follows the

current instruction in memory. But when the transfer of control type

instruction occurs they specify the location of the next instruction that is to

be executed. For example: Jump (Branch), Jump Conditional, Jump to

Subroutine, Return, Execute, Skip, Skip Conditional, Halt, Wait (Hold), And

No Operation.

System Control

These instructions are reserved for the use of Operating System. These

instructions can only be executed while the processor is in a privileged state,

or is executing a program in a special privileged area of memory.

6.4 Addressing Modes

In general, a program operates on data that reside in the computer’s

memory. These data can be organized in a variety of ways. If we want to

keep track of students’ names, we can write them in a list. If we want to

associate information with each name, for example to record telephone

numbers or marks in various courses, we may organize this information in

the form of a table. Programmers use organizations called data structures

to represent the data used in computations. These include lists, linked lists,

arrays, queues, and so on.



Programs are normally written in a high-level language, which enables the

programmer to use constants, local and global variables, pointers, and

arrays. When translating a high-level language program into assembly

language, the compiler must be able to implement these constructs using

the facilities provided in the instruction set of the computer in which the

program will be run. The different ways in which the location of an operand

is specified in an instruction are referred to as addressing modes.

There are the following addressing modes:

Immediate Addressing

Direct Addressing

Indirect Addressing

Register Addressing

Register Indirect Addressing

Displacement Addressing

Stack Addressing

Notation:

A = Contents of an address field in the instruction.

R = Contents of an address field in the instruction that refers to a

register.

EA = Effective (actual) Address of the location containing the referenced

operand.

(X) = Contents of location X.



Table 6.6: Basic Addressing Modes

Mode Algorithm Principal Advantage Principal Disadvantage

Immediate Operand = A No memory reference Limited operand magnitude

Direct EA = A Simple Limited address space

Indirect EA = (A) Large address space Multiple memory references

Register EA = R No memory reference Limited address space

Register Indirect

EA = (R) Large address space Extra memory reference

Displacement EA = A + (R) Flexibility Complexity

Stack EA = top of stack

No memory reference Limited capability

Direct Addressing Mode

EA = A.

Address field contains address of operand.

Effective address (EA) = address field (A) e.g. ADD A

Add contents of cell A to accumulator.

Look in memory at address A for operand.

Single memory reference to access data.

No additional calculations to work out effective address.

Limited address space

Figure 6.3

– The operand is in a memory location; the address of this location is given

explicitly in the instruction. (In some assembly languages, this mode is

called Direct.)

The instruction Move LOC, R2 uses these two modes. Processor registers

are used as temporary storage locations where the data in a register are

accessed using the Register mode. The Absolute mode can represent



global variables in a program. A declaration such as Integer A, B; in a high-

level language program will cause the compiler to allocate a memory

location to each of the variables A and B. Whenever they are referenced

later in the program, the compiler can generate assembly language

instructions that use the Absolute mode to access these variables. Next, let

us consider the representation of constants. Address and data constants

can be represented in assembly language using the immediate mode.

Immediate Addressing Mode

The operand is actually present ın the instruction.

Operand = A

Can be used to define and use constants, or set initial values.

Operand is part of instruction

Operand = address field

e.g. ADD 5

Add 5 to contents of accumulator

5 is operand

No memory reference to fetch data

Fast

Limited range

Figure 6.4: Immediate

For example, the instruction Move 200immediate, R0 places the value 200

in register R0. Clearly, the immediate mode is only used to specify the value

of a source operand. Using a subscript to denote the immediate mode is not

appropriate in assembly languages. A common convention is to use the

sharp sign (#) in front of the value to indicate that this value is to be used as

an immediate operand.

Hence, we write the instruction above in the form Move # 200, R0. Constant

values are used frequently in high-level language programs. For example,

the statement



A = B + 6 contains the constant 6. Assuming that A and B have been

declared earlier as variables and may be accessed using the Absolute

mode; this statement may be compiled as follows:

Move B, R1

Add #6, R1

Move R1, A

Constants are also used in assembly language to increment a counter, test

for some bit pattern, and so on.

Indirect Addressing Mode

Memory cell pointed to by address field contains the address of (pointer to) the operand

EA = (A)

Look in A, find address (A) and look there for operand

e.g. ADD (A)

Add contents of cell pointed to by contents of A to accumulator

Large address space

2n where n = word length

May be nested, multilevel, cascaded

e.g. EA = (((A))) Draw the diagram yourself

Multiple memory accesses to find operand

Hence slower

Figure 6.5

Indirect mode – The effective address of the operand is the contents of a

register or memory location whose address appears in the instruction.

We denote indirection by placing the name of the register or the memory

address given in the instruction in parentheses as illustrated in Figure and

Table.



To execute the Add instruction in Figure 6.6a the processor uses the value

B, which is in register R1, as the effective address of the operand. It

requests a read operation from the memory to read the contents of location

B. The value read is the desired operand, which the processor adds to the

contents of register R0. Indirect addressing through a memory location is

also possible as shown in Figure 6.6b. In this case, the processor first reads

the contents of memory location A,

Figure 6.6

Figure 6.7

then requests a second read operation using the value B as an address to

obtain the operand. The register or memory location that contains the



address of an operand is called a pointer. Indirection and the use of pointers

are important and powerful concepts in programming. Consider the analogy

of a treasure hunt: In the instructions for the hunt you may be told to go to a

house at a given address. Instead of finding the treasure there, you find a

note that gives you another address where you will find the treasure.

By changing the note, the location of the treasure can be changed, but the

instructions for the hunt remain the same. Changing the note is equivalent to

changing the contents of a pointer in a computer program. For example, by

changing the contents of register R1 or location A in Figure the same Add

instruction fetches different operands to add to register R0.

For adding a list of numbers, indirect addressing can be used to access

successive numbers in the list, resulting in the program shown in Figure 6.7.

Register R2 is used as a pointer to the numbers in the list, and the operands

are accessed indirectly through R2. The initialization section of the program

loads the counter value n from memory location N into R1 and uses the

immediate addressing mode to place the address value NUM1, which is the

address of the first number in the list, into R2. Then it clears R0 to 0.

The instruction, Add (R2), R0 fetches the operand at location NUM1 and

adds it to R0. The second Add instruction adds 4 to the contents of the

pointer R2, so that it will contain the address value NUM2 when the above

instruction is executed in the second pass through the loop.

Consider the C-language statement A=&B; where B is a pointer variable.

This statement may be compiled into Move B, R1

Move (R1), A

Using indirect addressing through memory, the same action can be

achieved with Move (B), A. Despite its apparent simplicity, indirect

addressing through memory has proven to be of limited usefulness as an

addressing mode, and it is seldom found in modern computers.



Indirect addressing through registers is used extensively. The program in

Figure shows the flexibility it provides. Also, when absolute addressing is

not available, indirect addressing through registers makes it possible to

access global variables by first loading the operand’s address in a register.

Register Addressing Mode

Operand is held in register named in address field

EA = R

Limited number of registers

Very small address field needed

Shorter instructions

Faster instruction fetch

No memory access

Very fast execution

Very limited address space

Multiple registers helps performance

Requires good assembly programming or compiler writing

N.B. C programming

register into a;

Figure 6.8

Figure 6.9

Figure 6.10

Figure 6.11



Register Indirect Addressing Mode o Register indirect addressing

o EA = (R)

o Operand is in memory cell pointed to by contents of register R

o Large address space (2n)

o Fewer memory access than indirect addressing

Indexing and pointers

The next addressing mode we discuss provides a different kind of flexibility

for accessing operands. It is useful in dealing with lists and arrays.

Index mode – The effective address of the operand is generated by adding

a constant value to the contents of a register. The register used may be

either a special register provided for this purpose, or, more commonly, it

may be any one of a set of general-purpose registers in the processor. In

either case, it is referred to as an index register. We indicate the Index

mode symbolically as X(Ri ) where X denotes the constant value contained

in the instruction and Ri is the name of the register involved. The effective

address of the operand is given by EA = X + [Ri ]. The contents of the index

register are not changed in the process of generating the effective address.

In an assembly language program, the constant X may be given either as an

explicit number or as a symbolic name representing a numerical value.

When the instruction is translated into machine code, the constant X is given

as a part of the instruction and is usually represented by fewer bits than the

word length of the computer. Since X is a signed integer, it must be sign-

extended to the register length before being added to the contents of the

register.

Figure 6.12 illustrates two ways of using the Index mode. In Figure 6.12a,

the index register, R1, contains the address of a memory location, and the



value X defines an offset (also called a displacement) from this address to

the location where the operand is found.

An alternative use is illustrated in Figure 6.12b. Here, the constant X

corresponds to a memory address, and the contents of the index register

define the offset to the operand. In either case, the effective address is the

sum of two values; one is given explicitly in the instruction, and the other is

stored in a register.

Figure 6.12

Displacement Addressing Mode

EA = A + (R)

Address field hold two values

A = base value



R = register that holds displacement

or vice versa

Relative Addressing Mode

o A version of displacement addressing

o R = Program counter, PC

o EA = A + (PC)

o i.e. get operand from A cells from current location pointed to by PC

Base-Register Addressing

o A holds displacement

o R holds pointer to base address

o R may be explicit or implicit

o e.g. segment registers in 80x86

Indexing

A = base

R = displacement

EA = A + R

Good for accessing arrays

EA = A + R

R++

Stack Addressing

o Operand is (implicitly) on top of stack

o e.g. ADD Pop top two items from stack and add

Other additional addressing modes

The two modes described next are useful for accessing data items in

successive locations in the memory.

Autoincrement mode – The effective address of the operand is the

contents of a register specified in the instruction. After accessing the



operand, the contents of this register are automatically incremented to point

to the next item in a list.

We denote the Autoincrement mode by putting the specified register in

parentheses, to show that the contents of the register are used as the

effective address, followed by a plus sign to indicate that these contents are

to be incremented after the operand is accessed. Thus, the Autoincrement

mode is written as (Ri )+.

6.5 Instruction Formats

Instruction Format is defined as the layout of bits in an instruction in terms

of its constituent parts. An Instruction Format must include opcode implicitly

or explicitly and one or more operand(s). For, most instruction sets have

usually more than one instruction format.

Instruction Length

Most important design issue is the length of an instruction. It is affected by

and affects Memory size, Memory organization, Bus structure, CPU

complexity, CPU speed. There is a trade off between powerful instruction

repertoire and saving space.

Apart from this tradeoff there are other considerations. Either the instruction

length should be equal to the memory transfer length or one should be a

multiple of the other. A related consideration is the memory transfer rate.

Allocation of Bits

The factors that determine the use of addressing bits are:

Number of addressing modes: Some times addressing mode is

implicit in the instruction or may be certain opcodes call for indexing. In

other cases the addressing mode must be explicit and one or more bits

are needed.



Number of operands: Typically today’s machines provide two operands.

Each operand may require its own mode indicator or the use of indicator

is limited to one of the address fields.

Register versus memory: A machine must have registers so that the

data can be brought into the CPU for processing. One operand address

is implicit. The more the registers are used to specify the operands less

the number of bits needed.

Number of register sets: Almost all machines have a set of general

purpose registers, with typically 8 or 16 registers in it. These registers

can be used to store data or addresses for displacement addressing etc.

Address range: For addresses that refer to the memory locations, the

range of addresses is related to the number of address bits. Because of

this limitation direct addressing is rarely used.

Address granularity: It is concerned with addresses that refer to the

memory other than registers. In a system with 16 or 32 bit words, an

address can refer to a word or a byte at the designer’s choice. Byte

addressing is convenient for character manipulation but requires fixed

size memory, and hence more address bits.

Variable-Length Instruction

The designer might provide a variety of instruction formats of different

lengths. Addressing can be more flexible, with various combinations of

registers and memory references plus addressing modes.

The price that needs to be paid is an increase in the complexity of the CPU.

1. Due to varying number of operands,

2. Due to varying lengths of opcode in some CPUs.



6.6 Stacks and Subroutines

Stacks

A computer program often needs to perform a particular subtask using the

familiar subroutine structure. In order to organize the control and information

linkage between the main program and the subroutine, a data structure

called a stack is used.

This section will describe stacks, as well as a closely related data structure

called a queue.

Data operated on by a program can be organized in a variety of ways. We

have already encountered a data structure called list. Now, we consider an

important data structure known as a stack. A stack is a list of data elements,

usually words or bytes, with the accessing restriction that elements can be

added or removed at one end of the stack only. This end is called the top of

the stack, and the other end is called the bottom.

The structure is sometimes referred to as a pushdown stack. Imagine a pile

of trays in a cafeteria; customers pick up new trays from the top of the pile,

and clean trays are added to the pile by placing them onto the top of the pile.

Another descriptive phrase, last-in-first-out (LIFO) stack, is also used to

describe this type of storage mechanism; the last data item placed on the

stack is the first one removed when retrieval begins. The terms push and

pop are used to describe placing a new item on the stack and removing the

top item from the stack, respectively.

Data stored in the memory of a computer can be organized as a stack, with

successive elements occupying successive memory locations. Assume that

the first element is placed in location BOTTOM, and when new elements are

pushed onto the stack, they are placed in successively lower address

locations. We use a stack that grows in the direction of decreasing memory

addresses in our discussion, because this is a common practice.



Figure 6.13 shows a stack of word data items in the memory of a computer.

It contains numerical values, with 43 at the bottom and 28 at the top. A

processor register is used to keep track of the address of the element of the

stack that is at the top at any given time. This register is called the stack

pointer (SP). It could be one of the general-purpose registers or a register

dedicated to this function. If we

Figure 6.13

assume a byte-addressable memory with a 32-bit word length; the push

operation can be implemented as

Subtract #4, SP

Move NEWITEM, (SP)

where the Subtract instruction subtracts the source operand 4 from the

destination operand contained in SP and places the result in SP. These two

instructions move the word from location NEWITEM onto the top of the



stack, decrementing the stack pointer by 4 before the move. The pop

operation can be implemented as

Move (SP), ITEM

Add #4, SP

These two instructions move the top value from the stack into location ITEM

and then increment the stack pointer by 4 so that it points to the new top

element. Figure 6.14 shows the effect of each of these operations on the

stack in Figure 6.13. If the processor has the Autoincrement and

Autodecrement addressing modes, then the push operation can be

performed by the single instruction Move NEWITEM,.(SP) and the pop

operation can be performed by Move (SP)+, ITEM.

When a stack is used in a program, it is usually allocated a fixed amount of

space in the memory. In this case, we must avoid pushing an item onto the

stack when the stack has reached its maximum size. Also, we must avoid

attempting to pop an item off an empty stack, which could result from a

programming error. Suppose that a stack runs from location 2000

(BOTTOM) down no further than location 1500. The stack pointer is loaded

initially with the address value 2004. Recall that SP is decremented by 4

before new data are stored on the stack. Hence, an initial value of 2004

means that the first item pushed onto the stack will be at location 2000. To

prevent either pushing an item on a full stack or popping an item off an

empty stack, the single-instruction push and pop operations can be replaced

by the instruction sequences shown in Figure 6.15.

Compare instruction: Compare src, dst performs the operation [dst] . [src]

and sets the condition code flags according to the result. It does not change

the value of either operand.



Figure 6.14

Figure 6.15



Subroutines

In a given program, it is often necessary to perform a particular subtask

many times on different data values. Such a subtask is usually called a

subroutine. For example, a subroutine may evaluate the sine function or

sort a list of values into increasing or decreasing order. It is possible to

include the block of instructions that constitute a subroutine at every place

where it is needed in the program. However, to save space, only one copy

of the instructions that constitute the subroutine is placed in the memory,

and any program that requires the use of the subroutine simply branches to

its starting location. When a program branches to a subroutine we say that it

is calling the subroutine. The instruction that performs this branch operation

is named a Call instruction.

After a subroutine has been executed, the calling program must resume

execution, continuing immediately after the instruction that called the

subroutine. The subroutine is said to return to the program that called it by

executing a Return instruction. Since the subroutine may be called from

different places in a calling program, provision must be made for returning to

the appropriate location. The location where the calling program resumes

execution is the location pointed to by the updated PC while the Call

instruction is being executed. Hence, the contents of the PC must be saved

by the Call instruction to enable correct return to the calling program.



Figure 6.16: An example of CALL and RETURN

The way in which a computer makes it possible to call and return from

subroutines is referred to as its subroutine linkage method. The simplest

subroutine linkage method is to save the return address in a specific

location, which may be a register dedicated to this function. Such a register

is called the link register. When the subroutine completes its task, the

Return instruction returns to the calling program by branching indirectly

through the link register.

The Call instruction is just a special branch instruction that performs the

following operations:

Store the contents of the PC in the link register

Branch to the target address specified by the instruction



The Return instruction is a special branch instruction that performs the

operation:

Branch to the address contained in the link register. Figure 6.16 illustrates

this procedure.

6.7 Summary

This chapter covers the representation of instructions and discusses

Arithmetic and logic instructions, Memory instructions, I/O instructions, and

Test and branch instructions. We have also seen different types of

addressing modes like direct, indirect, immediate and register, including the

important concepts of pointers and indexed addressing. Here we also see

the different data types like Addresses, Numbers, Characters, and Logical

data. Numbers can be integers or decimal (floating point). As an example

we have studied data types for two machines IBM and VAX. Finally the

concept of subroutine with the application of the stack data structure is

discussed.


1. ___________ specifies the operation to be performed.

2. Opcodes are represented using ___________.

3. Floating point numbers are ___________.

4. ___________ is referred as ASCII in USA.

5. ___________ instructions are reserved for the use of operating system.


1. Define the following:

a. Op-code.

b. Machine instruction.

2. Discuss the different categories of instructions.



3. Write a note on different data types with examples for each.

4. Give the details of Data types specified for VAX & IBM 370 machines.

5. Discuss in detail, with examples, the addressing modes that do not use

memory.

6. Explain the calculation of effective address in case of Displacement

addressing and relative addressing modes.

7. Write a note on:

a. Stack

b. Subroutines

6.9 Answers


1. operation code or opcode

2. mnemonics

3. real numbers

4. international reference alphabet (IRA)

5. system control

Terminal Questions:










Unit 7 Arithmetic Logic Unit

Structure:

7.1 Introduction

Objectives


7.3 Number Representations

Non-negative Integers

Negative Integers

Infinite-Precision Ten's Complement

Finite-Precision Ten's Complement

Finite-Precision Two's Complement

Rational Numbers

7.4 Summary


7.6 Answers

7.1 Introduction

In this unit we focus on the most complex aspect of ALU and control unit

which are the main components of the processing unit. We have seen in

the preceding units the task that CPU must do for execution of an

instruction. That is it fetches instruction, interprets it, fetches data,

processes data, and finally writes result into appropriate location. Internal

CPU bus is needed to transfer data between the various registers and the

ALU, because the ALU in fact operates only on data in the internal memory

of CPU that is registers.

Objectives:


1. Explain the ALU

2. Discuss the different number representations.




The ALU is the part of the CPU that actually performs arithmetic and logical

operations on data. All of the other elements of the computer system -

control unit, registers, memory, I/O - are there mainly to bring data into ALU

for it to process and then take the results back out.

Figure 7.1: ALU Inputs and Outputs

The inputs and outputs of ALU are shown in figure 7.1. The inputs to the

ALU are the control signals generated by the control unit of CPU, and the

registers of the CPU where the operands for the manipulation of data are

stored. The output is a register called status word or flag register which

reflects the result and the registers of the CPU where the result can be

stored. Thus data are presented to the ALU in registers, and the results of

an operation are also stored in registers. These registers are connected by

signal paths to the ALU. ALU does not directly interact with memory or other

parts of the system (e.g. I/O modules), it only interacts directly with

registers. An ALU like all other electronic components of a computer is

based on the use of simple digital devices that store binary digits and

perform Boolean logic operations.

The control unit is responsible for moving data to memory or I/O

modules. Also, it is the control unit that signals all the operations that

happen in the CPU. The operations, functions and implementation of

Control Unit will be discussed in the tenth unit.

In this unit we will concentrate on the ALU. An important part of the use of

logic circuits is for computing various mathematical operations such as

ALU

Control Unit

Registers Registers

Flags



addition, multiplication, trigonometric operations, etc. Hence we will be

discussing the arithmetic involved in using ALU.

First, before discussing the computer arithmetic we must have a way of

representing numbers as binary data.

7.3 Number Representations

Computers are built using logic circuits that operate on information

represented by two valued electrical signals. We label the two values as 0

and 1; and we define the amount of information represented by such a

signal as a bit of information, where bit stands for binary digit. The most

natural way to represent a number in a computer system is by a string of

bits, called a binary number. A text character can also be represented by a

string of bits called a character code. We will first describe binary number

representations and arithmetic operations on these numbers, and then

describe character representations.

Non-negative Integers

The easiest numbers to represent are the non-negative integers. To see

how this can be done, recall how we represent a number in the decimal

system. A number such as 2034 is interpreted as:

2*103 + 0*102 + 3*101 + 4*100

But there is nothing special with the base 10, so we can just as well use

base 2. In base 2, each digit value is either 0 or 1, which we can represent,

for instance, by false and true, respectively.

In fact, we have already hinted at this possibility, since we usually write 0

and 1 instead of false and true.

All the normal algorithms for decimal arithmetic have versions for binary

arithmetic, except that they are usually simpler.



Negative Integers

Things are easy as long as we stick to non-negative integers. They become

more complicated when we want to represent negative integers as well.

In binary arithmetic, we simply reserve one bit to determine the sign. In the

circuitry for addition, we would have one circuit for adding two numbers, and

another for subtracting two numbers. The combination of signs of the two

inputs would determine which circuit to use on the absolute values, as well

as the sign of the output.

While this method works, it turns out that there is one that is much easier to

deal with by electronic circuits. This method is called the ‘two's

complement’ method. It turns out that with this method, we do not need a

special circuit for subtracting two numbers.

In order to explain this method, we first show how it would work in decimal

arithmetic with infinite precision. Then we show how it works with binary

arithmetic, and finally how it works with finite precision.

Infinite-Precision Ten's Complement

Imagine the odometer of an automobile. It has a certain number of wheels,

each with the ten digits on it. When one wheel goes from 9 to 0, the wheel

immediately to the left of it advances by one position. If that wheel already

showed 9, it too goes to 0 and advances the wheel to its left, etc.

Now suppose we have an odometer with an infinite number of wheels. We

are going to use this infinite odometer to represent all the integers.

When all the wheels are 0, we interpret the value as the integer 0.

A positive integer n is represented by an odometer position obtained by

advancing the rightmost wheel n positions from 0. Notice that for each such

positive number, there will be an infinite number of wheels with the value 0

to the left.



A negative integer n is represented by an odometer position obtained by

decreasing the rightmost wheel n positions from 0. Notice that for each such

negative number, there will be an infinite number of wheels with the value 9

to the left.

In fact, we don't need an infinite number of wheels. For each number only a

finite number of wheels are needed. We simply assume that the leftmost

wheel (which will be either 0 or 9) is duplicated an infinite number of times to

the left.

While for each number we only need a finite number of wheels, the number

of wheels is unbounded, i.e., we cannot use a particular finite number of

wheels to represent all the numbers. The difference is subtle but important

(but perhaps not that important for this particular course). If we need an

infinite number of wheels, then there is no hope of ever using this

representation in a program, since that would require an infinite-size

memory. If we only need an unbounded number of wheels, we may run out

of memory, but we can represent a lot of numbers (each of finite size) in a

useful way. Since any program that runs in finite time only uses a finite

number of numbers, with a large enough memory, we might be able to run

our program.

Now suppose we have an addition circuit that can handle non-zero integers

with an infinite number of digits. In other words, when given a number

starting with an infinite number of 9s, it will interpret this as an infinitely large

positive number, whereas our interpretation of it will be a negative number.

Let us say, we give this circuit the two numbers ...9998 (which we interpret

as -2) and ...0005 (which we interpret as +5). It will add the two numbers.

First it adds 8 and 5 which gives 3 and a carry of 1. Next, it adds 9 and the

carry 1, giving 0 and a carry of 1. For all remaining (infinitely many)

positions, the value will be 0 with a carry of 1, so the final result is ...0003.



This result is the correct one, even with our interpretation of negative

numbers. You may argue that the carry must end up somewhere, and it

does, but in infinity. In some ways, we are doing arithmetic modulo infinity.

Some implementations of some programming languages with arbitrary

precision integer arithmetic (Lisp for instance) use exactly this

representation of negative integers.

Finite-Precision Ten's Complement

What we have said in the previous section works almost as well with a fixed

bounded number of odometer wheels. The only problem is that we have to

deal with overflow and underflow.

Suppose we have only a fixed number of wheels say 3. In this case, we

shall use the convention that if the leftmost wheel shows a digit between 0

and 4 inclusive, then we have a positive number, equal to its representation.

When instead the leftmost wheel shows a digit between 5 and 9 inclusive,

we have a negative number, whose absolute value can be computed with

the method that we have in the previous section.

We now assume that we have a circuit that can add positive three-digit

numbers, and we shall see how we can use it to add negative numbers in

our representation.

Suppose, again we want to add -2 and +5. The representations for these

numbers with three wheels are 998 and 005 respectively. Our addition

circuit will attempt to add the two positive numbers 998 and 005, which

gives 1003. But since the addition circuit only has three digits, it will truncate

the result to 003, which is the right answer for our interpretation.

A valid question at this point is in which situation our finite addition circuit will

not work. The answer is somewhat complicated. It is clear that it always

gives the correct result when a positive and a negative number are added. It



is incorrect in two situations. The first situation is when two positive numbers

are added, and the result comes out looking like a negative number, i.e. with

a first digit somewhere between 5 and 9. You should convince yourself that

no addition of two positive numbers can yield an overflow and still look like a

positive number. The second situation is when two negative numbers are

added and the result comes out looking like a non-negative number, i.e. with

a first digit somewhere between 0 and 4. Again, you should convince

yourself that no addition of two negative numbers can yield an underflow

and still look like a negative number.

We now have a circuit for addition of integers (positive or negative) in our

representation. We simply use a circuit for addition of only positive numbers,

plus some circuits that check:

If both numbers are positive and the result is negative, then report

overflow.

If both numbers are negative and the result is positive, then report

underflow.

Finite-Precision Two's Complement

So far, we have studied the representation of negative numbers using ten's

complement. In a computer, we prefer using base two rather than base ten.

Luckily, the exact method described in the previous section works just as

well for base two. For an n-bit adder (n is usually 32 or 64), we can

represent positive numbers with a leftmost digit of 0, which gives values

between 0 and 2(n-1) - 1, and negative numbers with a leftmost digit of 1,

which gives values between -2(n - 1) and -1.

Exactly the same rule works for overflow and underflow detection. If, when

adding two positive numbers, we get a result that looks negative (i.e. with its

leftmost bit 1), then we have an overflow. Similarly, if, when adding two

negative numbers, we get a result that looks positive (i.e. with its leftmost bit

0), then we have an underflow.



Rational Numbers

Integers are useful, but sometimes we need to compute with numbers that

are not integer.

An obvious idea is to use rational numbers. Many algorithms, such as the

simplex algorithm for linear optimization, use only rational arithmetic

whenever the input is rational.

There is no particular difficulty in representing rational numbers in a

computer. It suffices to have a pair of integers, one for the numerator and

one for the denominator.

To implement arithmetic on rational numbers, we can use some additional

restrictions on our representation. We may, for instance, decide that:

Positive rational numbers are always represented as two positive

integers (the other possibility is as two negative numbers),

Negative rational numbers are always represented with a negative

numerator and a positive denominator (the other possibility is with a

positive numerator and a negative denominator),

The numerator and the denominator are always relative prime (they

have no common factors).

Such a set of rules makes sure that our representation is canonical, i.e., that

the representation for a value is unique, even though, a priori, many

representations would work.

Circuits for implementing rational arithmetic would have to take such rules

into account. In particular, the last rule would imply dividing the two integers

resulting from every arithmetic operation with their largest common factor to

obtain the canonical representation.

Rational numbers and rational arithmetic is not very common in the

hardware of a computer. The reason is probably that rational numbers don't



behave very well with respect to the size of the representation. For rational

numbers to be truly useful, their components, i.e., the numerator and the

denominator both need to be arbitrary-precision integers. As we have

mentioned before, arbitrary precision anything does not go very well with

fixed-size circuits inside the CPU of a computer.

Programming languages, on the other hand, sometimes use arbitrary-

precision rational numbers. This is the case, in particular, with the language

Lisp.

7.4 Summary

The ALU operates only on data using the registers which are internal to the

CPU. All computers deal with numbers. We have studied the instructions

that perform basic arithmetic operations on data operands in the previous

unit. To understand the task carried out by the ALU, we have introduced the

representation of numbers in a computer and how they are manipulated in

addition and subtraction operations.


1. In binary arithmetic, we simply reserve ___________ bit to determine

the sign.

2. The infinite precision ten’s complement of -2 and +5 is ___________

and ___________.

3. The finite precision three digit ten’s complement of -2 and +5 is

___________ and ___________.

4. Using finite precision ten’s complement, if both numbers for addition are

positive and results negative, then the circuit reports ___________.

5. ___________ is not very common in the hardware of a computer.




1. Explain the ALU operations.

2. Discuss various number representations in a computer system.

7.6 Answers


1. one

2. 98, 05

3. 998, 005

4. overflow

5. rational numbers and rational arithmetic

Terminal Questions:





Unit 8 Binary Arithmetic

Structure:

8.1 Introduction

Objectives

8.2 Binary Arithmetic

Overflow in Integer Arithmetic

Binary Addition

Subtraction

Another Note on Overflow

Multiplication

Unsigned Integer Multiplication:

Straightforward Method

Unsigned Integer Multiplication:

A More Efficient Method

Positive Integer Multiplication

Signed Integer Multiplication

Division

8.3 Floating Point Numbers

Floating Point Variables

Floating Point Arithmetic

Addition of Floating-Point Numbers

Time for Floating-Point Addition

Pipelined Floating-Point Addition

8.4 Real Numbers

8.5 Summary


8.7 Answers



8.1 Introduction

In this unit we focus on how various arithmetic operations like addition,

subtraction, multiplication, and division of binary numbers are performed in

computers. It also discusses floating point numbers and rational numbers

representations.

Objectives:


1. Compute the addition of signed and unsigned integers.

2. Compute the addition of floating point numbers

3. Compute the multiplication and division of signed and unsigned integers.

8.2 Binary Arithmetic

Inside a computer system, all operations are carried out on fixed-length

binary values that represent application-oriented values. The schemes used

to encode the application information have an impact on the algorithms for

carrying out the operations. The unsigned (binary number system) and

signed (2’s complement) representations have the advantage that addition

and subtraction operations have simple implementations, and that the same

algorithm can be used for both representations. This note discusses

arithmetic operations on fixed-length binary strings, and some issues in

using these operations to manipulate information.

It might be reasonable to hope that the operations performed by a computer

always result in correct answers. It is true that the answers are always

correct but we must always be careful about what is meant by correct.

Computers manipulate fixed-length binary values to produce fixed-length

binary values. The computed values are correct according to the algorithms

that are used; however, it is not always the case that the computed value is

correct when the values are interpreted as representing application



information. Programmers must appreciate the difference between

application information and fixed-length binary values in order to appreciate

when a computed value correctly represents application information!

A limitation in the use of fixed-length binary values to represent application

information is that only a finite set of application values can be represented

by the binary values. What happens if applying an operation on values

contained in the finite set results in an answer that is outside the set? For

example, suppose that 4-bit values are used to encode counting numbers,

thereby restricting the set of represented numbers to 0 .. 15. The values 4

and 14 are inside the set of represented values. Performing the operation

4 + 14 should result in 18; however, 18 is outside the set of represented

numbers. This situation is called overflow, and programs must always be

written to deal with potential overflow situations.

Overflow in Integer Arithmetic

Using four bits, the range of numbers that can be represented is -8 through

+7. When the result of an arithmetic operation is outside the representable

range, an arithmetic overflow has occurred.

When adding unsigned numbers, the carry-out, cn, from the most significant

bit position serves as the overflow indicator. However, this does not work for

adding signed numbers. For example, when using 4-bit signed numbers, if

we try to add the numbers +7 and +4, the output sum vector, S, is 1011,

which is the code for -5, an incorrect result. The carry-out signal from the

MSB position is 0. Similarly, if we try to add -4 and -6, we get S = 0110 = +6,

another incorrect result, and in this case, the carry-out signal is 1. Thus,

overflow may occur if both summands have the same sign. Clearly, the

addition of numbers with different signs cannot cause overflow. This leads to

the following conclusions:



1. Overflow can occur only when adding two numbers that have the same

sign.

2. The carry-out signal from the sign-bit position is not a sufficient indicator

of overflow when adding signed numbers.

A simple way to detect overflow is to examine the signs of the two

summands X and Y and the sign of the result. When both operands X and Y

have the same sign, an overflow occurs when the sign of S is not the same

as the signs of X and Y.

Binary Addition

The binary addition of two bits (a and b) is defined by the table:

a b a + b

0 0 0

0 1 1 carry 0

1 0 1

1 1 0 carry 1

When adding n-bit values, the values are added in corresponding bit-wise

pairs, with each carry being added to the next most significant pair of bits.

The same algorithm can be used when adding pairs of unsigned or pairs of

signed values.

4-Bit Example A:

0 1 0 carry values

value 1: 1 0 1 1

+ value 2: + 0 0 1 0

result: 1 1 0 1

Since computers are constrained to deal with fixed-width binary values, any

carry out of the most significant bit-wise pair is ignored.

bit-wise pairs



4-Bit Example B:

1 1 1 0 carry values

value 1: 1 0 1 1

+ value 2: + 0 1 1 0

result: 1 0 0 0 1

The binary values generated by the addition algorithm are always correct

with respect to the algorithm, but what is the significance when the binary

values are intended to represent application information? Will the operation

yield a result that accurately represents the result of adding the application

values?

First consider the case where the binary values are intended to represent

unsigned integers (i.e. counting numbers). Adding the binary values

representing two unsigned integers will give the correct result (i.e. will yield

the binary value representing the sum of the unsigned integer values)

providing the operation does not overflow – i.e. when the addition operation

is applied to the original unsigned integer values, the result is an unsigned

integer value that is inside of the set of unsigned integer values that can be

represented using the specified number of bits (i.e. the result can be

represented under the fixed-width constrains imposed by the

representation).

Reconsider 4-Bit Example A (above) as adding unsigned values:

0 1 0 carry values

value 1: 1110 1 0 1 1

+ value 2: + 210 + 0 0 1 0

result: 1310 1 1 0 1

bitwise pairs

ignored 4-bit result



In this case, the binary result (11012) of the operation accurately represents

the unsigned integer sum (13) of the two unsigned integer values being

added (11 + 2), and therefore, the operation did not overflow. But what

about 4-Bit Example B (above)?


value 1: 1110 1 0 1 1

+ value 2: + 610 + 0 1 1 0

result: 1710 0 0 0 1 ???? 11 + 6 = 1 ?????

When the values added in Example B are considered as unsigned values,

then the 4-bit result (1) does not accurately represent the sum of the

unsigned values (11 + 6)! In this case, the operation has resulted in

overflow: the result (17) is outside the set of values that can be represented

using 4-bit binary number system values (i.e. 17 is not in the set {0 , … ,

15}). The result (00012) is correct according to the rules for performing

binary addition using fixed-width values, but truncating the carry out of the

most significant bit resulted in the loss of information that was important to

the encoding being used. If the carry had been kept, then the 5-bit result

(100012) would have represented the unsigned integer sum correctly.

But more can be learned about overflow from the above examples! Now

consider the case where the binary values are intended to represent signed

integers.

Reconsider 4-Bit Example A (above) as adding signed values:

0 1 0 carry values

value 1: – 510 1 0 1 1

+ value 2: + 210 + 0 0 1 0

result: – 310 1 1 0 1 no overflow !



In this case, the binary result (11012) of the operation accurately represents

the signed integer sum (– 3) of the two signed integer values being added

(– 5 + 2) therefore, the operation did not overflow. What about 4-Bit

Example B?


value 1: – 510 1 0 1 1

+ value 2: + 610 + 0 1 1 0

result: 110 0 0 0 1 no overflow!

In this case, the result (again) represents the signed integer answer

correctly, and therefore, the operation did not overflow.

Recall that in the unsigned case, Example B resulted in overflow. In the

signed case, Example B did not overflow. This illustrates an important

concept: overflow is interpretation dependent! The concept of overflow

depends on how information is represented as binary values. Different types

of information are encoded differently, yet the computer performs a specific

algorithm, regardless of the possible interpretations of the binary values

involved. It should not be surprising that applying the same algorithm to

different interpretations may have different overflow results.

Subtraction

The binary subtraction of two bits (a and b) is defined by the table:

a b a – b

0 0 0

1 0 1 borrow 0

1 1 0

0 1 1 borrow 1



When subtracting n-bit values, the values are subtracted in corresponding

bit-wise pairs; with each borrow rippling down from the more significant bits

as needed. If none of the more significant bits contains a 1 to be borrowed,

then 1 may be borrowed into the most significant bit.

4-Bit Example C:

must borrow from second digit

1 0 1 0

– 0 0 0 1

becomes: Interpretations

0 unsigned signed no overflow in either case

1 0 1 10 10 – 6

– 0 0 0 1 – 1 – 1

1 0 0 1 9 – 7

4-Bit Example D:

must borrow from above most significant digit

0 0 0 1

– 1 1 1 1

becomes: Interpretations

1 1 unsigned signed overflow in unsigned case

10 10 10 1 1 1 no overflow in signed case

– 1 1 1 1 – 15 – –1

0 0 1 0 2 2

Most computers apply the mathematical identity:

a – b = a + ( – b )

borrow from

above most signif.



to perform subtraction by negating the second value (b) and then adding.

This can result in a saving in transistors since there is no need to implement

a subtraction circuit.

Another Note on Overflow

Are there easy ways to decide whether an addition or subtraction results in

overflow? Yes but we should be careful that we understand the concept,

and don’t rely on memorizing case rules that allow the occurrence of

overflow to be identified.

For unsigned values, a carry out of (or a borrow into) the most significant bit

indicates that overflow has occurred.

For signed values, overflow has occurred when the sign of the result is

impossible for the signs of the values being combined by the operation. For

example, overflow has occurred if:

Two positive values are added and the sign of the result is negative

A negative value is subtracted from a positive value and the result is

negative (a positive minus a negative is the same as a positive plus a

positive, and should result in a positive value, i.e. a – ( – b) = a + b )

These are just two examples of some of the possible cases for signed

overflow.

Note that it is not possible to overflow for some signed values. For example,

adding a positive and a negative value will never overflow. To convince you

of why this is the case, picture the two values on a number line as shown

below. Suppose that a is a negative value, and b is a positive value. Adding

the two values, a + b will result in c such that c will always lie between a and

b on the number line. If a and b can be represented prior to the addition,

then c can also be represented, and overflow will never occur.



Multiplication

Multiplication is a slightly more complex operation than addition or

subtraction. Multiplying two n-bit values together can result in a value of up

to 2n-bits. To help to convince you of this, think about decimal numbers:

multiplying two 1-digit numbers together result in a 1- or 2-digit result, but

cannot result in a 3-digit result (the largest product possible is 9 x 9 = 81).

What about multiplying two 2-digit numbers? Does this extrapolate to n-digit

numbers? To further complicate matters, there is a reasonably simple

algorithm for multiplying binary values that represent unsigned integers, but

the same algorithm cannot be applied directly to values that represent

signed values (this is different from addition and subtraction where the same

algorithms can be applied to values that represent unsigned or signed

values!).

Overflow is not an issue in n-bit unsigned multiplication, proving that 2n-bits

of results are kept.

Now consider the multiplication of two unsigned 4-bit values a and b. The

value b can be rewritten in terms of its individual digits:

b = b3 23 + b2 22 + b1 21+ b0 20

Substituting this into the product a b gives:

a (b3 23 + b2 22 + b1 21+ b0 20 )

Which can be expanded into:

a b3 23 + a b2 22 + a b1 21+ a b0 20

b a

0

c



The possible values of bi are 0 or 1. In the expression above, any term

where bi = 0 resolves to 0 and the term can be eliminated. Furthermore, in

any term where bi = 1, the digit bi is redundant (multiplying by 1 gives the

same value, and therefore the digit bi can be eliminated from the term. The

resulting expression can be written and generalized to n-bits:

1n

0i

b*a a 2i where bi = 1

This expression may look a bit intimidating, but it turns out to be reasonably

simple to implement in a computer because it only involves multiplying by 2

(and there is a trick that lets computers do this easily!). Multiplying a value

by 2 in the binary number system is analogous to multiplying a value by 10

in the decimal number system. The result has one new digit: a 0 is injected

as the new least significant digit, and all of the original digits are shifted to

the left as the new digit is injected.

Think in terms of a decimal example, say: 37 10 = 370. The original value

is 37 and the result is 370. The result has one more digit than the original

value, and the new digit is a 0 that has been injected as the least significant

digit. The original digits (37) have been shifted one digit to the left to admit

the new 0 as the least significant digit.

The same rule holds good for multiplying by 2 in the binary number system.

For example:

1012 2 = 10102. The original value of 5 (1012) is multiplied by 2 to give 10

(10102). The result can be obtained by shifting the original value left one

digit and injecting a 0 as the new least significant digit.

The calculation of a product can be reduced to summing terms of the form

a 2i. The multiplication by 2i can be reduced to shifting left i times! The



shifting of binary values in a computer is very easy to do, and as a result,

the calculation can be reduced to a series of shifts and adds.

Unsigned Integer Multiplication: Straightforward Method

To compute: a * b

Where

Register A contains a = an-1an-2...a1a0

Register B contains b = bn-1bn-2...b1b0

and where register P is a register twice as large as A or B

Simple Multiplication Algorithm: Straightforward Method

Steps:

1. If LSB(A) = 1, then set Pupper to Pupper + b

2. Shift the register A right

3. Using zero sign extension (for unsigned values)

4. Forcing the LSB(A) to fall off the lower end

5. Shift the double register P right


7. Pushing the LSB(Pupper) into the MSB(Plower)

After n times (for n-bit values),

The full contents of the double register P = a * b

Example of Simple Multiplication Algorithm (Straightforward Method)

Multiply b = 2 = 00102 by a = 3 = 00112

(Answer should be 6 = 01102)



P A B Comments

0000 0000 0011 0010 Start: A = 0011; B = 0010; P = (0000, 0000)

0010 0000 0011 0010 LSB(A) = l ==> Add b to P

0010 0000 0001 0010 Shift A right

0001 0000 0001 0010 Shift P right

0011 0000 0001 0010 LSB(A) = l ==> Add b to P

0011 0000 0000 0010 Shift A right

0001 1000 0000 0010 Shift P right

0001 1000 0000 0010 LSB(A) = 0 ==> Do nothing

0001 1000 0000 0010 Shift A right

0000 1100 0000 0010 Shift P right

0000 1100 0000 0010 LSB(A) = 0 ==> Do nothing

0000 1100 0000 0010 Shift A right

0000 0110 0000 0010 Shift P right

0000 0110 0000 0010 Done: P is product

Unsigned Integer Multiplication: A More Efficient Method

To compute: a * b

where



and where register P is connected to register A to form a register twice

as large as A or B and register P is the upper part of the double register

(P, A)

Simple Multiplication Algorithm: A More Efficient Method

Steps:

1. If LSB(A) = 1, then set P to Pupper + b

2. Shift the double register (P, A) right





5. Pushing the LSB(Pupper) into MSB(Plower) = MSB(A)

After n times (for n-bit values),

The contents of the double register (P, A) = a * b

Example of Simple Multiplication Algorithm (A More Efficient Method)

Multiply b = 2 = 00102 by a = 3 = 00112


P A B Comments

0000 0011 0010 Start: A = 0011; B = 0010; P = (0000, 0000)

0010 0011 0010 LSB(A) = l ==> Add b to P

0001 0001 0010 Shift (P, A) right

0011 0001 0010 LSB(A) = l ==> Add b to P

0001 1000 0010 Shift (P, A) right

0001 1000 0010 LSB(A) = 0 ==> Do nothing

0000 1100 0010 Shift (P, A) right

0000 1100 0010 LSB(A) = 0 ==> Do nothing

0000 0110 0010 Shift (P, A) right

0000 0110 0010 Done: P is product

By combining registers P and A, we can eliminate one extra shift step per

iteration.

Positive Integer Multiplication

Shift-Add Multiplication Algorithm

(Same as Straightforward Method above)

For unsigned or positive operands

Repeat the following steps n times:

1. If LSB(A) = 1, then set Pupper to Pupper + b else set Pupper to Pupper + 0

2. Shift the double register (P, A) right



3. Using zero sign extension (for positive values)


Signed Integer Multiplication

Simplest:

1. Convert negative values of a or b to positive values

2. Multiply both positive values using one of the two algorithms above

3. Adjust sign of product appropriately Alternate: Booth algorithm

Introduction to Booth's Multiplication Algorithm

A powerful algorithm for signed-number multiplication is the booth algorithm.

It generates a 2n-bit product and treats both positive and negative numbers

uniformly.

Consider a positive binary number containing a run of ones e.g., the 8-bit

value: 00011110. Multiplying by such a value implies four consecutive

additions of shifted multiplicands

00010100 20

x 00011110 x 30

00000000 00

00010100 first 60

00010100 second

00010100 third

00010100 fourth

00000000

00000000

00000000

0000001001011000 600



Now 00011110 can be expressed as a difference of powers of two:

00100000 32

– 00000010 – 2

00011110 30

This means that the same multiplication can be obtained using only two

additions:

+ 25 x multiplicand of 00010100

- 21 x multiplicand of 00010100

Since the 1s complement of 00010100 is 11101011,

Then -00010100 is 11101100 in 2s complement

In other words, using sign extension and ignoring the overflow,

00010100 00010100

x 00011110 x 00011110

00000000 00000000

– 00010100 – 2 x 111111111101100

00000000 00000000

00000000 00000000

00000000 00000000

+ 00010100 + 32 x 00010100

00000000 00000000

00000000 00000000

0000001001011000 1|0000001001011000

Booth's Recoding Table



Booth recodes the bits of the multiplier a according to the following table:

ai ai-1 Recoded ai

0 0 0

0 1 +1

1 0 -1

1 1 0

Always assume that there is a zero to the right of the multiplier

i.e. that a-1 is zero

so that we can consider the LSB (= a0)

and an imaginary zero bit to its right

So, if a is 00011110 ,

the recoded value of a is

0 0 +1 0 0 0 -1 0

Example of Booth's Multiplication

A. Multiply 00101101 (b) by 00011110 (a) using normal multiplication

00101101 45

x 00011110 x 30

00000000 00

00101101 first 135

00101101 second

00101101 third

00101101 fourth

00000000

00000000

00000000

0000010101000110 1350



B. Multiply 00101101 (b) by 00011110 (a) using Booth's multiplication

Recall the recoded value of 00011110 (a) is 0 0 +1 0 0 0 -1 0

The 1s complement of 00101101 (b) is 11010010

The 2s complement of 00101101 (b) is 11010011

Hence, using sign extension and ignoring overflow:

00101101 45

x 00100010 x 30

00000000 00

111111111010011 - b 135

00000000

00000000

00000000

00101101 + b

00000000

00000000

1|0000010101000110 1350

Booth's Multiplication Algorithm

Booth's algorithm chooses between + b and - b depending on the two

current bits of a .

Algorithm:

1. Assume the existence of bit a-1 = 0 initially

2. Repeat n times (for multiplying two n-bit values):

a. At step i:

a. If ai = 0 and ai-1 = 0, add 0 to register P

b. If ai = 0 and ai-1 = 1, add b to register P

i.e. treat ai as +1

c. If ai = 1 and ai-1 = 0, subtract b from register P

i.e. treat ai as -1

d. If ai = 1 and ai-1 = 1, add 0 to register P

b. Shift the double register (P, A) right one bit with sign extension



Justification of Booth's Multiplication Algorithm

Consider the operations described above:

ai ai-1 ai-1 - ai Operation

0 0 0 Add 0xb to P

0 1 +1 Add +1xb to P

1 0 -1 Add -1xb to P

1 1 0 Add 0xb to P

Equivalently, the algorithm could state

2. Repeat n times (for multiplying two n-bit values): At step i, add (ai-1 - ai) x

b to register P

The result of all n steps is the sum:

(a-1 – a0) x 20 x b

+ (a0 – a1) x 21 x b

+ (a1 - a2) x 22 x b ...

+ (an-3 - an-2) x 2 n-2 x b

+ (an-2 - an-1) x 2 n-1 x b

where a-1 is assumed to be 0

This sum is equal to b x SUMi=0n-1 ((ai-1 - ai) x 2 i)

= b( -2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0) + ba-1

= b( -2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0)

since a-1 = 0

Now consider the representation of a as a 2s complement

It can be shown to be the same as

-2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0

where an-1 represents the sign of a

If an-1 = 0, a is a positive number

If an-1 = 1, a is a negative number



Example:

Let w = 10112

Then w = -1x23 + 0x22 + 1x21 + 1x20 = -8 + 2 + 1 = -5

Thus the sum above,

b( -2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0) ,

is the same as the 2s complement representation of b x a

Example of Booth's Multiplication Algorithm (Positive Numbers)

Multiply b = 2 = 00102 by a = 6 = 01102


The 2s complement of b is 1110

P A B Comments

0000 0110 [0] 0010 Start: a-1 = 0

0000 0110 [0] 0010 00 ==> Add 0 to P

0000 0011 [0] 0010 Shift right

1110 0011 [0] 0010 10 ==> Subtract b from P

1111 0001 [1] 0010 Shift right

1111 0001 [1] 0010 11 ==> Add 0 to P

1111 1000 [1] 0010 Shift right

0001 1000 [1] 0010 01 ==> Add b to P

0000 1100 [0] 0010 Shift right

0000 1100 [0] 0010 Done: (P, A) is product

Example of Booth's Multiplication Algorithm (Negative Numbers)

Multiply b = -5 = -(0101)2 = 10112 by a = -6 = -(0110)2 = 10102


The 2s complement of b is 1110

P A B Comments

0000 1010 [0] 1011 Start: a-1 = 0

0000 1010 [0] 1011 00 ==> Add 0 to P

0000 0101 [0] 1011 Shift right



0101 0101 [0] 1011 10 ==> Subtract b from P

0010 1010 [1] 1011 Shift right

1101 1010 [1] 1011 01 ==> Add b to P

1110 1101 [0] 1011 Shift right

0011 1101 [0] 1011 10 ==> Subtract b from P

0001 1110 [1] 1011 Shift right

0001 1110 [1] 1011 Done: (P, A) is product

Advantages and Disadvantages of Booth's Algorithm

Advantages:

Handles positive and negative numbers uniformly

Efficient when there are long runs of ones in the multiplier

Disadvantages:

Average speed of algorithm is about the same as with the normal

multiplication algorithm.

Worst case operates at a slower speed than the normal multiplication

algorithm.

Division

Terminology: dividend divisor = quotient & remainder

The implementation of division in a computer raises several practical issues:

For integer division there are two results: the quotient and the

remainder.

The operand sizes (number of bits) to be used in the algorithm must be

considered (i.e. the sizes of the dividend, divisor, quotient and

remainder).

Overflow is not an issue in unsigned multiplication, but is a concern with

division.

As with multiplication, there are differences in the algorithms for signed

vs. unsigned division.



Recall that multiplying two n-bit values can result in a 2n-bit value. Division

algorithms are often designed to be symmetrical with this by specifying:

the dividend as a 2n-bit value

the divisor, quotient and remainder as n-bit values

Once the operand sizes are set, the issue of overflow may be addressed.

For example, suppose that the above operand sizes are used, and that the

dividend value is larger than a value that can be represented in n bits (i.e.

2n – 1 < dividend). Dividing by 1 (divisor = 1) should result with quotient =

dividend; however, the quotient is limited to n bits, and therefore is

incapable of holding the correct result. In this case, overflow would occur.

Sign Extension / Zero Extension

When loading a 16-bit value into a 32-bit register,

How is the sign retained?

By loading the 16-bit value into the lower 16 bits of the 32-bit register and

By duplicating the MSB of the 16-bit value throughout the upper 16 bits of

the 32-bit register

This is called sign extension

If instead, the upper bits are always set to zero,

It is called zero extension

Unsigned Integer Division

To compute: a / b

where





and where register P is connected to register A to form a register twice as

large as A or B and register P is the upper part of the double register as

done in many of the Multiplication methods.

Simple Unsigned Division

– for unsigned operands

Steps:

1. Shift the double register (P, A) one bit left


3. And forcing the MSB(P) to fall off the upper end

4. Subtract b from P

5. If result is negative, then set LSB(A) to 0 else set LSB(A) to 1

6. If result is negative, set P to P + b

After repeating these steps n times (for n-bit values), the contents of register

A = a / b, and the contents of register P = remainder (a / b)

This is also called restoring division

Restoring Division Example

Divide 14 = 11102 by 3 = 00112

P A B Comments

+00000 1110 0011 Start

+00001 1100 0011 Shift left

-00010 1100 0011 Subtract b

+00001 1100 0011 Restore

+00001 1100 0011 Set LSB(A) to 0

+00011 1000 0011 Shift left

+00000 1000 0011 Subtract b

+00000 1001 0011 Set LSB(A) to 1

+00001 0010 0011 Shift left

-00010 0010 0011 Subtract b



+00001 0010 0011 Restore

+00001 0010 0011 Set LSB(A) to 0

+00010 0100 0011 Shift left

-00001 0100 0011 Subtract b

+00010 0100 0011 Restore

+00010 0100 0011 Set LSB(A) to 0

+00010 0100 0011 Done

Restoring versus Non-restoring Division

Let r = the contents of (P, A)

At each step, the restoring algorithm computes

P = (2r - b)

If (2r - b) < 0,

then

(P, A) = 2r (restored)

(P, A) = 4r (shifted left)

(P, A) = 4r - b (subtract b for next step)

If (2r - b) < 0

and there is no restoring,

then

(P, A) = 2r - b (restored)

(P, A) = 4r - 2b (shifted left)

(P, A) = 4r - b (add b for next step)

Non-restoring Division Algorithm

Steps:

1. If P is negative,

a) Shift the double register (P, A) one bit left

b) Add b to P



2. Else if P is not negative,

a) Shift the double register (P, A) one bit left

b) Subtract b from P

3. If P is negative, then set LSB(A) to 0 else set LSB(A) to 1

After repeating these steps n times (for n-bit values),

if P is negative, do a final restore

i.e. add b to P

Then

the contents of register A = a / b, and

the contents of register P = remainder (a / b)

Non-restoring Division Example

Divide 14 = 11102 by 3 = 00112

P A B Comments

00000 1110 0011 Start

00001 1100 0011 Shift left

11110 1100 0011 Subtract b

11110 1100 0011 P negative; Set LSB(A) to 0

11101 1000 0011 Shift left

00000 1000 0011 P negative; Add b

00000 1001 0011 P positive; Set LSB(A) to 1

00001 0010 0011 Shift left

11110 0010 0011 Subtract b


11100 0100 0011 Shift left

11111 0100 0011 P negative; Add b


00010 0100 0011 P=remainder negative; Need final restore

00010 0100 0011 Done



Division by Multiplication

There are numerous algorithms for division many of which do not relate as

well to the division methods learned in elementary school One of these is

Division by Multiplication It is also known as Goldschmidt's algorithm.

If we wish to divide N by D, then we are looking for Q such that Q = N/D

Since N/D is a fraction, we can multiply both the numerator and denominator

by the same value, x, without changing the value of Q. This is the basic idea

of the algorithm. We wish to find some value x such that Dx becomes close

to 1 so that we only need to compute Nx To find such an x, first scale D by

shifting it right or left so that 1/2 <= D < 1

Let s = #shifts be positive if to the left

and negative, otherwise

Call this D' (= D · 2s)

Now compute x such that x = 1 - D'

Call this Z

Notice that 0 < Z <= 1/2

since Z = 1 - D'

Furthermore D' = 1 - Z

Then

Q = N / D

= N (1+Z) / D (1+Z)

= 2s N (1+Z) / 2s D (1+Z)

= 2s N (1+Z) / D' (1+Z)

= 2s N (1+Z) / (1-Z) · (1+Z)

= 2s N (1+Z) / (1-Z2)



Similarly,

Q = N / D

= 2s N (1+Z) / (1-Z2)

= 2s N (1+Z) (1+Z2) / (1-Z2) (1+Z2)

= 2s N (1+Z) (1+Z2) / (1-Z4)

And,

Q = N / D

= 2s N (1+Z) (1+Z2) / (1-Z4)

= 2s N (1+Z) (1+Z2) (1+Z4) / (1-Z4) (1+Z4)

= 2s N (1+Z) (1+Z2) (1+Z4) / (1-Z8)

continuing to

Q = N / D

= 2s N (1+Z) (1+Z2) (1+Z4) · · · (1+Z2n-1) / (1-Z2n)

Since 0 < Z <= 1/2,

Zi goes to zero as i goes to infinity. This means that the denominator, 1-Z2n,

goes to 1 as n gets larger. Since the denominator goes to 1, we need only

compute the numerator in order to determine (or approximate) the quotient,

Q

So, Q = 2s N (1+Z) (1+Z2) (1+Z4) · · · (1+Z2n-1)

Examples of Division by Multiplication

Example 1:

Let's start with a simple example in the decimal system

Suppose N = 1.8 and D = 0.9

(Clearly, the answer is 2)

Since D = 0.9, it does not need to be shifted

so the number of shifts, s, is zero



And since we are in the decimal system,

we use 10s in the equation for Q

Furthermore, Z = 1 - D = 1 - 0.9 = 0.1

For n = 0,

Q = N / D

= 1.8 / 0.9

= 100 · 1.8

= 1.8

For n = 1,

Q = N / D

= 1.8 / 0.9

= 100 · 1.8 (1 + Z)

= 1.8 (1 + 0.1)

= 1.8 · 1.1

= 1.98

For n = 2,

Q = N / D

= 1.8 / 0.9

= 100 · 1.8 (1 + Z) (1 + Z2)

= 1.8 (1 + 0.1) (1 + 0.01)

= 1.8 · 1.1 · 1.01

= 1.98 · 1.01

= 1.9998

For n = 3,

Q = N / D

= 1.8 / 0.9

= 100 · 1.8 (1 + Z) (1 + Z2) (1 + Z4)

= 1.8 (1 + 0.1) (1 + 0.01) (1 + 0.0001)

= 1.8 · 1.1 · 1.01 · 1.0001



= 1.98 · 1.01 · 1.0001

= 1.9998 · 1.0001

= 1.99999998

And so on, getting closer and closer to 2.0 with each increase of n

8.3 Floating Point Numbers

Instead of using the obvious representation of rational numbers presented in

the previous section, most computers use a different representation of a

subset of the rational numbers. We call these numbers floating-point

numbers.

Floating-point numbers use inexact arithmetic, and in return require only a

fixed-size representation. For many computations (so-called scientific

computations, as if other computations weren't scientific) such a

representation has the great advantage that it is fast, while at the same time

usually giving adequate precision.

There are some (sometimes spectacular) exceptions to the “adequate

precision” statement in the previous paragraph, though. As a result, an

entire discipline of applied mathematics, called numerical analysis, has been

created for the purpose of analyzing how algorithms behave with respect to

maintaining adequate precision, and of inventing new algorithms with better

properties in this respect.

The basic idea behind floating-point numbers is to represent a number as

mantissa and an exponent, each with a fixed number of bits of precision. If

we denote the mantissa with m and the exponent with e, then the number

thus represented is m * 2e.

Again, we have a problem that a number can have several representations.

To obtain a canonical form, we simply add a rule that m must be greater

than or equal to 1/2 and strictly less than 1. If we write such a mantissa in



binal (analogous to decimal) form, we always get a number that starts with

0.1. This initial information therefore does not have to be represented, and

we represent only the remaining “binals”.

The reason floating-point representations work well for so-called scientific

applications are that we more often need to multiply or divide two numbers.

Multiplication of two floating-point numbers is easy to obtain. It suffices to

multiply the mantissas and add the exponents. The resulting mantissa might

be smaller than 1/2, in fact, it can be as small as 1/4. In this case, the result

needs to be canonicalized. We do this by shifting the mantissa left by one

position and subtracting one from the exponent. Division is only slightly

more complicated. Notice that the imprecision in the result of a multiplication

or a division is only due to the imprecision in the original operands. No

additional imprecision is introduced by the operation itself (except possibly 1

unit in the least significant digit). Floating-point addition and subtraction do

not have this property.

To add two floating-point numbers, the one with the smallest exponent must

first have its mantissa shifted right by n steps, where n is the difference of

the exponents. If n is greater than the number of bits in the representation of

the mantissa, the second number will be treated as 0 as far as addition is

concerned. The situation is even worse for subtraction (or addition of one

positive and one negative number). If the numbers have roughly the same

absolute value, the result of the operation is roughly zero, and the resulting

representation may have no correct significant digits.

The two's complement representation that we have mentioned above is

mostly useful for addition and subtraction. It only complicates things for

multiplication and division. For multiplication and division, it is better to use a

representation with sign + absolute value. Since multiplication and division

is more common with floating-point numbers, and since they result in



multiplication and division of the mantissa, it is more advantageous to have

the mantissa represented as sign + absolute value. The exponents are

added, so it is more common to use two’s complement (or some related

representation) for the exponent.

Usually, computers manipulate data in chunks of 8, 16, 32, 64, or 128 bits. It

is therefore useful to fit a single floating-point number with both mantissa

and exponent in such a chunk. In such a chunk, we need to have room for

the sign (1 bit), the mantissa, and the exponent. While there are many

different ways of dividing the remaining bits between the mantissa and the

exponent, in practice most computers now use a norm called IEEE, which

mandates the formats as shown in figure 8.1

Figure 8.1: Formats of floating point numbers

Floating Point Variables

Floating point variables have been represented in many different ways

inside computers of the past. But there is now a well adhered to standard for

the representation of floating point variables. The standard is known as the

IEEE Floating Point Standard (FPS). Like scientific notation, FPS represents



numbers with multiple parts, a sign bit, one part specifying the mantissa and

a part representing the exponent. The mantissa is represented as a signed

magnitude integer (i.e., not 2's Compliment), where the value is normalized.

The exponent is represented as an unsigned integer which is biased to

accommodate negative numbers. An 8-bit unsigned value would normally

have a range of 0 to 255, but 127 is added to the exponent, giving it a range

of -126 to +127.

Follow these steps to convert a number to FPS format.

1. First convert the number to binary.

2. Normalize the number so that there is one nonzero digit to the left of the

binary place, adjusting the exponent as necessary.

3. The digits to the right of the binary point are then stored as the mantissa

starting with the most significant bits of the mantissa field. Because all

numbers are normalized, there is no need to store the leading 1.

Note: Because the leading 1 is dropped, it is no longer proper to refer to

the stored value as the mantissa. In IEEE terms, this mantissa minus its

leading digit is called the significant.

4. Add 127 to the exponent and convert the resulting sum to binary for the

stored exponent value. For double precision, add 1023 to the exponent.

Be sure to include all 8 or 11 bits of the exponent.

5. The sign bit is a one for negative numbers and a zero for positive

numbers.

6. Compilers often express FPS numbers in hexadecimal, so a quick

conversion to hexadecimal might be desired.

Here are some examples using single precision FPS.

3.5 = 11.1 (binary)

= 1.11 x 2^1 sign = 0, significant = 1100...,



exponent = 1 + 127 = 128 = 10000000

FPS number (3.5) = 0100000001100000...

= 0 x 40600000

100 = 1100100 (binary)

= 1.100100 x 2^6 sign = 0, significant = 100100...,

exponent = 6 + 127 = 133 = 10000101

FPS number (100) = 010000101100100...

= 0 x 42c80000

What decimal number is represented in FPS as 0 x c2508000?

Here we just reverse the steps.

0xc2508000 = 11000010010100001000000000000000 (binary)

sign = 1; exponent = 10000100; significant =

10100001000000000000000

exponent = 132 ==> 132 - 127 = 5

-1.10100001 x 2^5 = -110100.001 = -52.125

Floating Point Arithmetic

Until fairly recently, floating point arithmetic was performed using complex

algorithms with a integer arithmetic ALU. The main ALU in CPUs is still

integer arithmetic ALU. However, in the mid-1980s, special hardware was

developed to perform floating point arithmetic. Intel, for example, sold a chip

known as the 80387 which was a math co-processor to go along with the

80386 CPU. Most people did not buy the 80387 because of the cost. A

major selling point of the 80486 was that the math co-processor was

integrated onto the CPU which eliminated the purchase of a separate chip to

get faster floating point arithmetic.

Floating point hardware usually has a special set of registers and

instructions for performing floating point arithmetic. There are also special



instructions for moving data between memory or the normal registers and

the floating point registers.

Addition of Floating-Point Numbers

The steps (or stages) of a floating-point addition:

1. The exponents of the two floating-point numbers to be added are

compared to find the number with the smallest magnitude.

2. The significant of the number with the smaller magnitude is shifted so

that the exponents of the two numbers agree.

3. The significants are added.

4. The result of the addition is normalized.

5. Checks are made to see if any floating-point exceptions occurred during

the addition, such as overflow.

6. Rounding occurs.

Floating-Point Addition Example

Example: s = x + y

numbers to be added are x = 1234.00 and y = -567.8

these are represented in decimal notation with a mantissa (significand)

of four digits

six stages (A - F) are required to complete the addition

Step A B C D E F

X 0.1234E4 0.12340E4

Y -0.05678E3 -0.05678E4

S 0.066620E4 0.6662E3 0.6662E3 0.6662E3

(For this example, we are throwing out biased exponents and the assumed

1.0 before the magnitude. Also all numbers are in the decimal number

system and no complements are used.)



Time for Floating-Point Addition

Consider a set of floating-point additions sequentially following one another

(as in adding the elements of two arrays)

Assume that each stage of the addition takes t time units;

Time: t 2t 3t 4t 5t 6t 7t 8t

Step

A x1 + y1 x2 + y2

B x1 + y1 x2 + y2

C x1 + y1

D x1 + y1

E x1 + y1

F x1 + y1

Each floating-point addition takes 6t time units

Pipelined Floating-Point Addition

With the proper architectural design, the floating-point addition stages can

be overlapped

Time: t 2t 3t 4t 5t 6t 7t 8t

Step

A x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5 x6 + y6 x7 + y7 x8 + y8

B x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5 x6 + y6 x7 + y7

C x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5 x6 + y6

D x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5

E x1 + y1 x2 + y2 x3 + y3 x4 + y4

F x1 + y1 x2 + y2 x3 + y3



This is called pipelined floating-point addition. Once the pipeline is full and

has produced the first result in 6t time units, it only takes 1t time units to

produce each succeeding sum.

8.4 Real Numbers

One sometimes hears variations on the phrase “computers can't represent

real numbers exactly”. This, of course is not true. Nothing prevents us from

representing (say) the square-root of two as the number two and a bit

indicating that the value is the square root of the representation. Some

useful operations could be very fast this way. It is true though; that we

cannot represent all real numbers exactly. In fact, we have a similar problem

as that we have with rational numbers, in that it is hard to pick a useful

subset that we can represent exactly, other than the floating-point numbers.

For this reason, no widespread hardware contains built-in real numbers

other than the usual approximations in the form of floating-point.

8.5 Summary

In this unit, we have dealt with various core operations to be performed in

arithmetic viz. Addition, Subtraction, Multiplication and Division. We have

discussed the various techniques for fixed point unsigned and signed

numbers arithmetic. We have also discussed Booth algorithm for

multiplication of binary numbers. A separate section is dedicated to the

Floating point standard prevalent today that includes the IEEE standard

also. We have also seen the addition operation for floating point numbers,

laying stress on the time constraint and pipelining concepts.




1. Using 4-bit fixed length binary, the addition of 4 and 14 results in

__________.

2. The two’s complement of -5 is __________.

3. Floating-point numbers is to represent a number __________.

4. IEEE FPS represents numbers with __________.

5. Floating point hardware usually has a special set of __________ and

__________ for performing floating point arithmetic.


1. Explain the addition of a two floating point numbers with examples.

2. Discuss the different formats of floating point numbers.

3. Compute the product of 7 by 2 using booth’s algorithm.

8.7 Answers


1. overflow

2. (1011)

3. as mantissa and an exponent

4. a sign bit, the mantissa and the exponent

5. registers, instructions

Terminal Questions






Unit 9 Memory Unit – Part I

Structure:

9.1 Introduction

Objectives

9.2 Characteristics of Memory Systems

9.3 Main Memory

Types of Random-Access Semiconductor Memory

Organization

Static and dynamic memories

9.4 Memory system considerations

Design of memory subsystem using Static Memory Chips

Design of memory subsystem using Dynamic Memory Chips

9.5 Memory interleaving

9.6 Summary


9.8 Answers

9.1 Introduction

Memory unit is used for storage, and retrieval of data and instructions. A

typical computer system is equipped with a hierarchy of memory

subsystems, some internal to the system and some external. Internal

memory systems are accessible by the CPU directly and external memory

systems are accessible by the CPU by an I/O module.

Objectives:


1. Define the different units of transfer of data.

2. Explain the various accessing methods

3. Explain with neat sketches various memory organization.

4. Discuss the design of memory subsystem using dynamic memory chips.



9.2 Characteristics of Memory Systems

Memory systems are classified according to their key characteristics. The

most important are listed below:

Location

The classification of memory is done according to the location of the

memory as:

CPU: The CPU requires its own local memory in the form of registers

and also the control unit requires local memories which are fast

accessible. We have already studied this in detail in our earlier

discussions.

Internal (main): It is often equated with the main memory. There are

other forms of internal memory. We will be discussing the internal

memory in the next coming sections of this unit.

External (secondary): It consists of peripheral storage devices like hard

disks, magnetic disks, magnetic tapes, CDs etc.

Capacity

Capacity is one of the important aspects of the memory.

Word size: Word size is the natural unit of organization of memory. The

size of the word is typically equal to the number of bits used to represent a

number and is equal to the instruction length. But there are many

exceptions. Common word lengths are 8, 16 and 32 bits.

Number of words: The addressable unit is the word in many systems.

However external memory capacity is generally expressed in terms of bytes.

Unit of Transfer

Unit of transfer for internal memory is equal to the number of data lines into

and out of memory module.



Word: For internal memory, unit of transfer is equal to the number of

data lines into and out of the memory module. Need not be equal to a

word or addressable unit.

Block: For external memory, data are often transferred in much larger

units than a word, and these are referred to as blocks.

Access Method

Sequential: Tape units have sequential access. Data are generally

stored in units called "records". Data is accessed sequentially; the

records may be passed (or rejected) until the record that is searched for

is found. The access time to a certain record is highly variable.

Direct: Individual blocks or records have a unique address based on

physical location. A block may contain a group of data. Access is

accomplished by direct address to reach general vicinity, plus sequential

searching, counting or waiting to reach the final location. Disk units

have direct access.

Random: Each addressable location in memory has a unique,

physically wired-in addressing mechanism. The time to access a given

location is independent of the sequence of prior accesses and constant.

Any location can be selected at random and directly addressed and

accessed. Main memory and some cache systems are random access.

Associative: This is a random-access type of memory that enables one

to make a comparison of desired bit locations within a word for a

specified match, and to do this for all words simultaneously. Thus, a

word is retrieved based on a portion of its contents rather than its

address. Some cache memories may employ associative access.

Performance

Access time: For random-access memory, this is the time it takes to

perform a read or write operation. That is, the time from the instant that



an address is presented to the memory to the instant that data have

been stored or made available for use. For non-random-access memory,

access time is the time it takes to position the read-write mechanism at

the desired location.

Cycle time: Applied to random-access memory. It consists of the

access time plus any additional time required before a second access

can commence.

Transfer rate: This is the rate at which data can be transferred into or

out of a memory unit. For random-access memory, it is equal to

(1/<cycle-time>).

For non-random-access memory, the following relationship holds:

Tn = Ta + N/R

where Tn = Average time to read or write N bits;

Ta = Average access time,

N = Number of bits

R = Transfer rate, in bits per second (bps).

Physical Type

Semiconductor: Main memory, cache. RAM, ROM.

Magnetic: Magnetic disks (hard disks), magnetic tape units.

Optical: CD-ROM, CD-RW.

Magneto-Optical: The recording technology is fundamentally magnetic.

However an optical laser is used. The read operation is purely optical.

Physical Characteristics

Volatile/Non-volatile: In a volatile memory, information decays naturally

or is lost when electrical power is switched off. In a non-volatile memory,

information once recorded remains without deterioration until

deliberately changed; no electrical power is needed to retain information.



Magnetic-surface memories are nonvolatile. Semiconductor memories

may be either volatile or non-volatile.

Erasable/Non-erasable: Non-erasable memory cannot be altered

(except by destroying the storage unit). ROMs are non-erasable.

Memory Hierarchy

Design constraints: How much? How fast? How expensive?

Faster access time, greater cost per bit

Greater capacity, smaller cost per bit,

Greater capacity, slower access time.

9.3 Main Memory

The main memory stores data and instructions. Main memories are usually

built from dynamic IC‟s known as dynamic RAMs. These semiconductor ICs

can also implement static memories referred to as static RAMs (SRAMs).

SRAMs are faster but cost per bit is higher. These are often used to build

caches.

Types of Random-Access Semiconductor Memory:

Dynamic RAM (DRAM): Example: Charge in capacitor. It requires periodic

refreshing.

Static RAM (SRAM): Example: Flip-flop logic-gates. Applying power is

enough (no need for refreshing). Dynamic RAM is simpler and hence

smaller than the static RAM. Therefore they are denser and less expensive.

But it requires supporting refresh circuitry. Static RAMs are faster than

dynamic RAMs.

ROM: The data is actually wired in the factory. It can never be altered.



PROM: Programmable ROM. It can only be programmed once after its

fabrication. It requires special device to program.

EPROM: Erasable Programmable ROM. It can be programmed multiple

times. Whole capacity need to be erased by ultraviolet radiation before a

new programming activity. It cannot be partially programmed.

EEPROM: Electrically Erasable Programmable ROM. Erased and

programmed electrically. It can be partially programmed. Write operation

takes considerably longer time compared to read operation.

Each more functional ROM is more expensive to build, and has smaller

capacity than less functional ROM's.

Organization

Basic element of semiconductor memory is the memory cell. All

semiconductor memory cells have certain properties:

Have two stable states that represent binary 0 and 1.

Capable of being written into (at least once), to set the state.

Capable of being read to sense the state.

Individual cells can be selected for reading and writing operations.

The cell has three functional terminals which are shown in figure 9.1. The

select terminals selects a cell for read or write operation, the control

indicates read or write, and the third terminal is used to write into the cell,

that is to set the state of the cell to 0 or 1. Similarly for read operation the

third terminal is used to output the state of the cell.



Figure 9.1: Memory cell organization (a) : write (b) for read operation

Chip Logic

Semiconductor memory comes in package chips. Each chip contains an

array of memory cells. Two organizational approaches have been used 2D

and 2½D.

Figure 9.2: 2D memory organizations

select

control

select

Cell Data in

Cell

control

sense

a)



A) 2D memory organization

In 2D memory organization the physical arrangements of a memory chip

(memory cells in an array) is same as logical words in memory. Figure 9.2

shows the 2D memory organization. A memory chip has W words of B bits

each. Hence its capacity is (W x B) bits.

E.g.: A 16-Mbit memory chip can be organized as 1M 16-bit words or as 4M

4-bit words. The 1st memory chip will have 20 address and 16 data lines,

the 2nd memory chip will have 22 address and 4 data lines. Remember that

the lines may be multiplexed. So less address lines can be used in a

specific memory chip.

Number of Address Lines = log2 W.

The figure also depicts the additional circuitry. Address lines supply the

address of the word to be selected. A total of log2 W lines are needed are

input to the decoder and the output is to activate a single output based on

bit pattern out of W words.

Example: Address lines 0101 input to a decoder activates the 6th output line

(address starts from 0). This output is then used to select one of the word

lines. Data lines are used for input and output of B bits from/to the selected

word line chip simultaneously to/from the data buffer. The disadvantage of

2D organization is that all bits of any given word are on the same chip.



A) 2½ D memory organization

Figure 9.3: 2½ D memory organization

The 2½D memory organization is as shown in figure 9.3. In 2½D memory

organization the bits of a particular word are spread across multiple chips.

The most extreme and most common organization is to allow only 1 bit of a

given word on a chip. Now the array itself is like a matrix, each cell is

connected to a row line and a column line. For any operation to select a bit

of a particular word, the word address is split into two. One part is fed to the

decoder to select one row and second part of address is fed to another

decoder to select one of the columns.

From the system standpoint, the memory unit can be viewed as a “black

box”. Data transfer between the main memory and the CPU register takes

place through two registers namely MAR (memory address register) and

MDR (memory data register). If MAR is k bits long and MDR is n bits long,

the main memory unit can contain up to 2k addressable locations. During a

„memory cycle‟ n bits of data are transferred between main memory and



CPU. This transfer takes place over the processor bus, which has k address

lines and n data lines.

Figure 9.4: Connection of main memory to CPU

The bus also includes control lines Read, Write, and Memory Function

Completed (MFC) for coordinating the data transfers. In the case of byte

addressable computers, another control line is added to indicate when only

a byte, rather than a full word of n bits, is transferred. Fig. 9.4 shows the

connection between the CPU and main memory. The CPU initiates a

memory operation by loading the appropriate data into registers MDR and

MAR, and setting either Read or Write memory control line to 1. When the

required operation is completed the memory control circuitry sends MFC

signal to CPU.

The time that elapses between the initiation of an operation and completion

of that operation is called memory access time. The minimum time delay

between two successive memory operations is called memory cycle time.

The cycle time is usually slightly lower than the access time. The main

memory is Random Access Memory.

Main memory Up to 2

k

addressable locations Word length = n

bits

MAR

MDR

k-bit address bus

n-bit data bus

(Read, Write, MFC etc.)

Control lines



Therefore any location can be accessed for a Read or Write operation in

some fixed amount of time. Semiconductor Integrated Circuits are used for

the implementation of main memories.

Some of the techniques used to increase the effective speed and size of the

main memory are discussed in the following sections.

Static and Dynamic Memories

Semiconductor memories in which the storage cells are small transistor

circuits are used for high speed CPU registers. Single chip RAMs can be

manufactured in sizes ranging from a few hundred bits to 1 GB or more.

Semiconductor memories fall into two categories, SRAMs (static RAMs) and

DRAMs (dynamic RAMs). Both bipolar transistor and MOS ((Metal Oxide

Semiconductor) transistor are used for designing RAMs but MOS is

dominant technology for designing large RAMs.

A semiconductor memory constructed using bipolar transistors or MOS

transistor stores information in the form of a flip-flop voltage levels. These

voltage levels are not likely to be discharged. Such memories are called

static memories. In these memories information remains constant for longer

period of time. Semiconductor memory designed using a MOS with a

capacitor stores the information in the form of charge on a capacitor. The

stored charge has a tendency to leak away. A stored „1‟ will become „0‟ if no

precautions are taken. These types of memory are called dynamic

memories.

A) Static memories

Static RAM cells use 4 – 6 transistors to store a single bit of data. This

provides faster access times at the expense of lower bit densities. A

processor's internal memory (registers and cache) is fabricated using static

RAM. SRAMs resemble the flip-flops used in the processor design. SRAM



cells differ from the flip-flops primarily in methods used to address the cells

and transfer data to and from them.

The six-transistor SRAM is shown in Fig. 9.5. A signal applied to the word

line (also called as address line) by the address decoder selects the cell

either for Read or Write operation. The two bit lines (also called as data

lines) are used to transfer stored data and its complement between the cell

and data drivers.

b +5V b‟

T3 T4

T5 T6

T1 T2

Bit lines Word line

Fig. 9.5: An n-channel MOS memory cell

Static RAM is used extensively for second level cache memory, where its

speed is needed and a relatively small memory will lead to a significant

increase in performance.



B) Dynamic memories

The bulk of a modern processor's memory is composed of dynamic RAM

(DRAM) chips. A DRAM memory cell uses a single transistor and a

capacitor to store a bit of data. In a DRAM cell, the „1‟ and „0‟ states

correspond to the presence or absence of a stored charge in a capacitor

controlled by the transistor switching circuit. Since a DRAM can be

constructed by a single transistor, the storage density is higher. Since the

charge stored in DRAM may leak with time, the cell must be periodically

refreshed.

Fig. 9.6 illustrates one-transistor DRAM cell. Transistor used is a MOS

transistor which acts as a switch and a capacitor to store a data bit. To write

information into the cell, a voltage signal is applied to the data line. Voltage

signal can either be high or low representing 1 and 0 respectively. A signal

is applied to the word line to switch on T. Now the capacitor charges if the

data line is 1. When the transistor is off, the capacitor begins to discharge

due to capacitor‟s own leakage resistance and due to the fact that the

transistor continues to conduct a very small amount of current after it is

turned off. Hence the information stored in the cell can be retrieved correctly

only if it is read before the charge on the capacitor drops below some

threshold value. The memory cell is therefore refreshed every time its

contents are read. When a DRAM is being refreshed, other accesses must

be "held off". This increases the complexity of DRAM controllers.



Bit line

Word line

T

C

Fig. 9.6: A single-transistor dynamic memory cell

To read the cell, the word line is activated. Now the charge stored in

capacitor is transferred to the bit line where it is detected. Almost all DRAMs

fabricated require the address applied to the device to be asserted in two

parts: a row address and a column address. This has a deleterious effect on

the access time, but enables devices with large numbers of bits to be

fabricated with fewer pins (enabling higher densities).

Let us study the internal organization of a 64K 1 dynamic memory chip.

The cells are arranged in the form of a square array. The higher order 8 bits

of the 16 bits constitute the row address of the cell and the lower order 8 bits

constitute the column address of a cell. The row and column addresses are

multiplexed on eight pins. This is done to reduce the number of pins needed

for external connections. During a Read or Write operation the row address

is applied first. It is loaded into the row address latch under the control of

Row Address Strobe (RAS) input of the chip.

Then a Read operation is initiated in which all cells in a selected row are

read and refreshed. Soon after the row address is loaded, the column

address is applied to the address pins and loaded into the column address

latch under the control of Column Address Strobe (CAS) signal. The

information is decoded in the column decoder and the appropriate

w rite / sense circuit is selected. If the W/R control signal initiates a Read

operation, output of the selected circuit is transferred to the data output 0D .



RAS

Row address

latch

Row

decoder 256 256

cell array

Sense / write

amplifiers

Column decoder

Column

address

latch

07A

1D

CS

WR /

0D CAS

Fig. 9.7: Internal organization of a 64K 1 dynamic memory chip

To perform a Write operation, the information at the data input DI is given to

the column decoder. The information is then used to overwrite the contents

of the selected cell in the corresponding column.

Application of a row address causes cells on the corresponding row to be

read and refreshed during both Read and Write operations. To ensure that

the contents of dynamic memory are maintained, each row of cells must be

addressed periodically. A refresh circuit usually performs this function. In

some dynamic memory chips the refresh facility is available within the chip

itself. In such cases dynamic nature of these chips are completely invisible

to the user. Such chips are known as pseudo static.

Advantages

1. High bit density

2. Available chips range from 1K to 4M bits, and even larger chips are

being developed.

3. Low power dissipation



Disadvantages

Slower speed of operation

9.4 Memory System Considerations

While choosing a RAM chip for a given application several factors have to

be taken into account. The most important factors are the speed, power

dissipation, and size of the chip. In some situations other features such as

availability of block transfers may be important. Because of high power

dissipation in bipolar circuits, high bit density cannot be achieved. Hence a

memory constructed using bipolar chips needs relatively large number of

chips. Bipolar memories (static memories) are used only when very fast

operation is required.

Static MOS memory chips have higher densities and slightly longer access

time than bipolar memories. They have lower densities than dynamic

memories, but are easier to use because they do not require refreshing

circuits. Since it doesn't need refresh, static RAMs consume less power than

dynamic RAMs. SRAMs will be found in battery-powered systems. The

absence of refresh circuitry leads to slightly simpler systems, so SRAM will

also be found in very small systems, where the simplicity of the circuitry

compensates for the cost of the memory devices themselves.

Dynamic MOS memory is the ruling technology used in computer main

memories. Since higher density is achieved with this technology, it is

economical to implement large memories using dynamic MOS memories.

Let us study how a memory subsystem is designed using static and

dynamic memory chips.

Design of memory subsystem using Static Memory Chips

Consider a small memory consisting of 64K words; each word is of 16 bits.

Figure 9.8 shows organization of this memory using 16K 1 static memory



chips. A single static memory chip has a control input called Chip Select

(CS). When CS is set to 1, it enables the chip to accept data input or to

place data on the output line. The data output for each chip is of three state

type. Only the selected chip places the data on the output line while all the

other outputs are in high impedance state.

There are 4 chips in each column. Each column represents one bit position.

There are 16 such columns to build a 64K 16 memory. The address bus

required for this memory is of 16 bits wide. The high-order 2 bits of the

address are decoded to obtain the 4 chip select control signals. The

remaining 14 address bits are used to access specified bit locations inside

the chip of the selected row. The R / W inputs are given to all chips which

provide common Read / Write control.

Fig. 9.8: Organization of a 64K 16 memory using static memory chips



Design of memory subsystem using Dynamic Memory Chips

Now consider a memory implemented using dynamic memory chips. The

organization of this memory is same as that of static memory. However

control circuitry of this memory differs from that of static memory in three

respects. The block diagram of a 256 16 dynamic memory chip is as

shown in figure 9.9. First, the row and column parts of the address for each

chip have to be multiplexed. Second, a refresh circuit is needed. Finally the

timing of various steps of the memory cycle must be carefully controlled.

Access

control

Refresh

control Refresh

counter

Address

Multiplexer

RAS CAS

4 16 DRAM array

CS0-3 R/W DI DO

Decoder

Timing control

0-7ADRS

16ADRS

17ADRS

writeRead/

0-15DATA

Start MFC

Refresh

request

Refresh

grant

Busy

Refresh line

columnRow/

8-15ADRS

Figure 9.9: A block diagram of a 256 16 dynamic memory chip

Block diagram consists of dynamic memory chips and a control circuitry.

Dynamic memory chips are arranged in a 4 16 array. If the individual

chips have 64K 1 organization, then the total storage capacity of array is



256K words, each word is of 16 bits. The control circuitry provides the

multiplexed address, chip select, row and column address strobe signals

(RAS and CAS) to the memory chip array. The memory unit is assumed to

be connected to an asynchronous memory bus that has 18 address lines

(ADRS17-0), 16 data lines (DATA15-0), two handshake signals (MFC and

Memory Request) and a Write / Read line to indicate the type of the

memory cycle required.

Operation of the control circuitry for a memory read cycle

1. The cycle starts when the CPU activates the address, the Write / Read

and the memory request lines.

2. When the memory request signal becomes active, the access control

block recognizes the request and it sets start signal to 1.

3. The timing and control circuit sends a busy signal in order to prevent the

access control box from accepting the new requests until the current

cycle ends.

4. The timing and control block then loads the row and column address into

the memory chips by activating RAS and CAS. First it uses

columnRow / line to select the row address (ADD15-8), followed by the

column address (ADD7-0).

5. The decoder block decodes two most significant bits of the address and

generates 4 chip select signals CS0-3.

6. After obtaining the row and column parts of the address, the selected

memory chips place the contents of the requested bit cells on their data

outputs.

7. This data is then transferred to the data lines of the memory bus through

appropriate drivers.

8. Now timing and control circuit send MFC signal to CPU indicating that

the requested data is available on the memory bus.



9. Finally the busy signal is deactivated so that the access control unit is

now free to accept new requests.

The timing unit is responsible throughout the process to ensure that various

signals are activated and deactivated in accordance with the specifications

of the particular type of memory chips used.

Control sequence for refresh operation

The main purpose of the refresh circuit is to maintain the integrity of the

stored contents of the cell. A priority is given to the refresh over the CPU

bus to ensure that no information is lost, during simultaneous requests.

Sequence of refresh operation is as follows:

1. The refresh control block periodically generates refresh requests,

causing the access control block to start a memory cycle in the normal

way.

2. The access control block arbitrates between memory access requests

and refresh requests. If two requests are activated simultaneously,

refresh requests are given priority in order to ensure that no information

is lost.

3. As soon as the refresh control block receives the refresh grant signal, it

activates the refresh line.

4. The address multiplexer selects the refresh counter as the source for the

row address, instead of ADD15-8 and the contents of the counter will be

loaded into the row address latches of all the memory chips when the

RAS signal is activated.

5. During this time, the Write / Read line may indicate a write operation. It

is important to ensure that this does not inadvertently cause new

information to be loaded into some of the cells that are being refreshed.

This needed protection can be provided in several ways. One way is to

have the decoder block deactivate all CS lines to prevent the memory



chips from responding to the Write / Read line. The remainder of the

refresh cycle is then same as in a normal cycle.

6. At the end, the refresh control block increments the refresh counter in

preparation for the next refresh cycle.

Although it requires an 8-bit row address, the refresh counter needs only

seven bits wide because of the cell organization inside its memory chips. In

fact the 256 256 array in Fig. 9.9, consists of two 128 256 array each

having its own set of Write / sense circuits. One row of each of the two

arrays is accessed during any memory cycle, depending on the lower order

7 bits of the row address. The most significant bit is used only in a normal

Write / Read cycle to select one of the two groups of 256 columns.

Because of this organization, the frequency of refresh operation can be

reduced to half of what would be needed if the memory cells had been

organized in a single 256 256 array.

The main purpose of the refresh circuit is to maintain the integrity of the

stored information. Its existence should be invisible to the remainder of the

computer system. Also other parts of the system should not be affected by

the operation of the refresh circuit. If memory access and refresh requests

occur simultaneously, refresh circuit is given the priority so that no stored

information is lost. Thus the response of the memory to a request from CPU

may be delayed if refresh operation is in progress. The amount of delay

depends on the mode of refresh operation. During a refresh operation, all

memory rows may be refreshed in succession before the memory is

returned to normal use. A more common scheme, however, interleaves

refresh operations on successive rows with accesses from the memory bus,

which results in shorter, but more frequent refresh periods.



In case of a synchronous, it may be possible to hide a refresh cycle within

the early part of a bus cycle if sufficient time remains after a refresh cycle to

carry out a Read or Write access.

9.5 Memory interleaving

Another technique called memory interleaving divides the system into a

number of modules and arranges them so that successive words in the

address space are placed in different modules. If memory access requests

are made for consecutive addresses, then the access will be made for

different modules. Since parallel access to these modules is possible, the

average rate of fetching words from the main memory can be increased.

M3

MA

R

MD

R

M4

MA

R

MD

R

MEMORY SWITCHING SYSTEM

4 INSTRUCTION BUFFERS IN THE PROCESSOR

M1

MA

R

MD

R

M2

MA

R

MD

R

Addre

ss

lines

Dat

a li

nes

4

word

s w

ide

Figure 9.10: Memory interleaving



Fig. 9.10 illustrates interleaved memory where main memory is divided into

4 different modules. When an instruction FETCH is issued by the processor,

a memory access circuit creates four consecutive addresses and places

them in four MARs. A memory read command reads all the four modules

simultaneously and retrieves four instructions. These are sent to the

processor. Thus each FETCH instruction fetches 4 consecutive instructions.

One may combine interleaving and cache to reduce the speed mismatch

between the cache memory and main memory. To illustrate this consider

the time required for transferring a block of data from the main memory to

the cache when a read miss occurs. Assume that a cache with 8-word

blocks is used. When cache miss occurs, the block that contains desired

word must be copied from the main memory into the cache.

Assume that the hardware has the following properties,

It takes one clock cycle to send an address to the main memory.

The memory is build with DRAM chips that allow the first word to be

accessed in 8 clock cycles, but subsequent words of the block are

accessed in 4 clock cycles per word.

One clock cycle is needed to send one word to the cache.

If a single memory is used, then the time needed to load the desired block

into the cache is 1+ 8 + (74) +1 = 38 cycles.

Similarly now assume that the main memory is divided into 4 interleaved

modules using interleaving technique. When the starting address of the

block arrives at the memory all the four modules start accessing the

required data, using higher order bits of the address. After 8 clock cycles,

each module has one word of data in its MDR. These words are transferred

to the cache, one word at a time, during the next clock cycles. During this

time the next word in each module is accessed. Then it takes another 4



cycles to transfer these words to the cache. Therefore the total time required

to load the block from the interleaved memory is 1 + 8 + 4 + 4 = 17 cycles.

Thus interleaving reduces the block transfer by more than a factor of 2.

9.6 Summary

A computer system is equipped with a hierarchy of memory subsystems.

There are several memory types with very different physical properties. The

important characteristics of memory devices are cost per bit and access

time data transfer rate, alterability and compatibility with processor

technologies.

The main memory is a major component in any computer system. Its

characteristics in terms of size and speed play an important role in

determining the performance of a given computer. There are several

techniques available to increase the effective speed and size of the

memory. An intermediate memory called cache memory can be introduced

between main memory and CPU. We have introduced the reader the

principle, structure of cache memory along with the mapping functions and

replacement algorithms. Finally we touched upon few external memory

elements and another method to increase the effective size of the memory

is to introduce virtual memory.


1. ____________ requires its own local memory in the form of registers.

2. ____________ is often equated with the main memory.

3. Disk units have ____________.

4. ____________ is dominant technology for designing large RAMs.

5. Static RAM cells use ____________ transistors to store a single bit of

data.




1. Explain the characteristics of memory system.

2. Discuss the organization of main memory.

3. Explain the concept of memory interleaving.

9.8 Answers


1. CPU

2. internal

3. direct access

4. MOS

5. 4 to 6

Terminal Questions:






Unit 10 Memory Unit – Part II

Structure:

10.1 Introduction

Objectives

10.2 Cache Memory

Principles of cache memory

Structure of cache and main memory

Performance using cache memory

Elements of Cache Design

Mapping functions

Replacement algorithms

10.3 External Memory

Magnetic Disk

RAID

10.4 Virtual memory

10.5 Memory Management in Operating Systems

10.6 Summary


10.8 Answers

10.1 Introduction

Memory unit is used for storage, and retrieval of data and instructions. A

typical computer system is equipped with a hierarchy of memory

subsystems, some internal to the system and some external. Internal

memory systems are accessible by the CPU directly and external memory

systems are accessible by the CPU by an I/O module.



Objectives:


1. With neat diagram explain the cache memory. Also explain its interaction

with the processor.

2. Explain the cache read operation

10.2 Cache Memory

For all instruction cycles, the CPU accesses the memory at least once to

fetch the instruction and sometimes again accesses memory to fetch the

operands. The rate at which the CPU can execute instructions is limited by

the memory cycle time. This limitation is due to the mismatch between the

memory cycle time and processor cycle time. Ideally, the main memory

should be built with the same technology as that of CPU registers, giving

memory cycle times comparable to processor cycle times but this is a too

expensive strategy.

The solution is to exploit the principle of locality by providing a small, fast

memory between the CPU and the main memory. This memory is known as

cache memory. Thus the intention of using cache memory is to give memory

speed approaching the speed of fastest memories available, and at the

same time provide a large memory size at the price of less expensive types

of semiconductor memory.



Principles of Cache Memory

CPU Cache

Main Memory

Word Transfer Block

Transfer

smaller, faster

larger, slower

Cache tag memory

(Directory)

Cache data memory

address control data

(b)

(a)

Cache

Figure 10.1 (a & b): Cache and main memory

The concept is illustrated in figure 10.1. The cache memory contains the

copy of portions of main memory. When CPU attempts to read a word from

main memory, a check is made to find if the word is in cache memory. If so

cache and then the word is delivered to the CPU. The purpose of reading a

block from main memory is that it is likely that future references will be to

other words in the block.

At any time, a subset of main memory resides in the lines of the cache.

Each line of the cache contains a tag, which identifies the location in

memory of the block it contains. Word is delivered to the CPU. If not then a

block of main memory consisting of fixed number of words is read into. This

tag is usually a portion of the main memory address. The collections of tags

which are currently assigned to the cache are stored in a special memory,

called the cache tag memory or directory as shown in fig 10.1(a). The time



required to access the cache memory is less than the time required to

access main memory.

During a Read or Write cycle the CPU initiates the memory access by

placing an address on the memory bus. This address is compared with the

tag address of the cache. If the tag address matches with address, cache hit

is said to occur. Otherwise a cache miss is said to occur. When a cache hit

occurs, the Read or Write operation is performed in cache, main memory is

not involved. In the case of a cache miss, the required data is brought from

main memory into cache. When the cache is full, the cache control

hardware must decide which block is to be removed to create space for the

new block that contains the referenced word. The collection of rules for

making this decision constitutes the replacement algorithm. The complete

cache organization is as shown in figure 10.1(c). The CPU does not know

explicitly about the existence of the cache.

Figure 10.1(c): Cache organization showing the interconnection with the

processor



When it makes Read and Write requests using the addresses that refer to

locations in the main memory, the memory access control circuitry

determines whether the requested word currently exists in the cache. The

transfers between the main memory and CPU are fast because of small

block size and fast RAM access methods.

Structure of cache and main memory

The structure of the cache and main memory is as shown in figure 10.2.

Main memory consists of n2 addressable words, each word having a

unique n bit address. This memory is considered to consist of a number of

fixed length blocks of size = K words each. That is /Kn

2M blocks. Cache

consists of C lines of K words each. The number of lines (C) of cache is less

than the number of main memory blocks (M).

C<<M

Figure 10.2: Cache and Main memory



Cache Read operation

Now let us study how cache read operation is executed. The flow chart

illustrating the steps of cache read operation is given in figure 10.3. The

processor generates the address ‘RA’ of a word to be read. If the word is

contained in the cache, it is delivered to the processor. Otherwise, the block

containing that word is loaded into the cache and simultaneously the word is

delivered to the processor from main memory. These last two operations

occur in parallel.

Figure 10.3: Cache Read Operation



In case of interleaved memory, contiguous block transfers are very efficient.

Transferring data in blocks between the main memory and the cache

enables an interleaved memory to operate at its maximum possible speed.

A cache can be introduced in two general ways. They are look-aside design

and look-through design.

A) Look-aside design

Figure 10.4 shows a look-aside design. In this case both CPU and main

memory are directly connected to the system bus. When a cache miss

occurs, it results in data transfer between cache and main memory through

the system bus, making it unavailable for other I/O operations. This

drawback can be overcome in look-through design.

Cache Block

access replacement

Main memory access

System bus

Figure 10.4: Organization of look-aside design

B) Look-through design

It is a faster, complex and costly organization when compared to the look-

aside organization. Block diagram of look through design is given in figure

10.5. The communication between CPU and cache is through a separate

bus, which is isolated from the main system bus. Hence system bus is

Cache

CPU

Main

memory



available for other units, such as I/O controllers to communicate with main

memory.

Thus the memory accesses which do not involve system bus can be

performed concurrently. In a look-through cache the local bus linking main

memory and cache is wider than the system bus. Therefore data transfer

between cache and main memory is faster. For example if system data bus

is 32 bits wide and cache block size is 128 bits, a 128 bit data bus might be

provided to link between cache and main memory.

Figure 10.5: Organization of look-through cache

The main drawback of this design is that it takes more time for main memory

to respond to the CPU when a cache miss occurs.

Performance using Cache Memory

The data is transferred from the main memory to the cache in blocks.

Typical block size is 4 to 64 words. When the cache is full, one of the

existing blocks will be evicted using standard replacement policies such as

First-in-First-out or Least Recently Used (LRU). This block transfer is carried

in anticipation that the block size is very likely to be referenced again and



again in the near future. The performance of a system that employs cache

can be analyzed as follows:

Let t c , h and tm represent the cache access time, hit ratio and the main

memory access time respectively. Then the average access time can be

determined by the equation,

) t (t h) -(1 h.t t mcc The hit ratio always lies in the closed interval 0 and 1,

and it specifies the relative number of successful references to the cache.

The above equation can be delivered using the fact that when there is a

cache hit, the main memory will not be accessed, and in the case of cache

miss, both main memory and cache will be accessed.

Suppose the ratio of main memory access time to the cache access time is ,

then an expression for the efficiency of a system that uses a cache can be

delivered as follows.

h)]γ(1[11/

γh) h - 1 (h / 1

γ]) h)(1 - (1 [h / 1

)]c/tmth)(1 - (1 [h1/

)]mt c h)(t -(1 c[htct

t /ct η Eff iciency

Thus is maximum when h=1; i.e. efficiency of the system that uses cache

is maximum when all the references are confined to the cache.



Elements of Cache Design

Cache Size: Small cache => performance problems, large cache => too

expensive.

Mapping Function:

a) Direct Mapping:

Each line of cache can store specific blocks of main memory. The line

number is given as

i = j modulo m,

where i = cache line number, j = main memory block number, and

m = number of lines in the cache.

b) Associative Mapping:

Permits of loading each main memory block to any line of the cache.

c) Set associative Mapping:

It is a compromise that exhibits the strengths of both the direct and

associative approaches while reducing their disadvantages.

We will be discussing the mapping function in detail in the following section:

Replacement Algorithm (for lines)

When a new block is brought into cache, one of the existing blocks must

be replaced. The most common algorithms are:

Least recently used (LRU)

First in first out (FIFO)

Least frequently used (LFU)

Random



We will be discussing the Replacement algorithms in detail in the following

section:

Write Policy:

Before a block that is resident in the cache can be replaced, it is

necessary to consider whether it has been altered in the cache but not in

the main memory.

Write through:

All write operations are both directly done to main memory and to cache.

Main memory always contains valid content.

Write back:

Writes are only done to cache. There is an UPDATE bit set when there

is a write. Before a cache block (line) is replaced, if UPDATE bit is set,

its content is written to main memory. Problem is that portions of main

memory are invalid for a certain period of time. If other devices access

those locations, they will get wrong content. Therefore access to main

memory by I/O modules can only be allowed through the cache. This

makes complex circuitry and potential bottleneck.

Write once:

Problem: If there are other cache's in the system (e.g. multi-processor

system sharing the same main memory), even in "write through", other

cache's may contain invalid content.

Line Size:

Greater line size => more hit (+) but also more line replacements (-). Too

big line size => less chance of hit for some parts of the block.

Researchers suggest that 2 to 8 words seems reasonably close to

optimum.



Number of caches:

Cache, internal to processor is called Level-1 (L1) cache, external cache

is called Level-2 (L2) cache.

Mapping functions

The correspondence between the main memory and CPU are specified by a

mapping function. There are three standard mapping functions namely

1. Direct mapping

2. Associative mapping

3. Block set associative mapping

In order to discuss these methods consider a cache consisting of 128 blocks

of 16 words each. Assume that main memory is addressable by a 16 bit

address. For mapping purpose main memory is viewed as composed of 4K

blocks.

1. Direct mapping technique

This is the simplest mapping technique. In this case, block K of the main

memory maps onto block K modulo 128 of the cache. Since more than one

main memory block is mapped onto a given cache block position, contention

may arise for that position even when the cache is not full. This is overcome

by allowing the new block to overwrite the currently resident block.



Figure 10.6: Direct mapping cache

A main memory address can be divided into three fields, TAG, BLOCK and

WORD as shown in figure 10.6. The TAG bit is required to identify a main

memory block when it is resident in the cache. When a new block enters the

cache the 7-bit cache block field determines the cache position in which this

block must be stored. The tag field of that block is compared with tag field of

the address. If they match, then the desired word is present in that block of

cache. If there is no match, then the block containing the required word

must be first read from the main memory and then loaded into the cache.

2. Associative mapping technique

This is a much more flexible mapping technique. Here any main memory

block can be loaded to any cache block position. Associative mapping is

illustrated as shown in figure 10.7. In this case 12 tag bits are required to

identify a main memory block when it is resident in the cache.



Figure 10.7: Associative mapping cache

The tag bits of an address received from the CPU are compared with the tag

bits of each cache block to see if the desired block is present in the cache.

Here we need to search all 128 tag patterns to determine whether a given

block is in the cache. This type of search is called associative search.

Therefore the cost of implementation is higher. Because of complete

freedom in positioning, a wide range of replacement can be used.



3. Block set associative mapping

Figure 10.8: Block set associative mapping cache with two blocks per set

This is a combination of two techniques discussed above. In this case

blocks of the cache are grouped into sets and the mapping allows a block of

main memory to reside in any block of a particular set. Set associative

mapping is illustrated as shown in figure 10.8.

Consider an example, a cache with two blocks per set. The 6 bit set field of

the address determines which set of the cache might contain the addressed

block. The tag field of the address must be associatively compared to the

tags of the two blocks of the set to check if the desired block is present.

Advantages

a) The contention problem of the direct method is overcome by having few

choices for block replacement.

b) The hardware cost is reduced by decreasing the size of the associative

search.



Note

The number of blocks per set can be selected according to the

requirements of a particular computer system. Four blocks per set can

be accommodated by a 5 bit set field, eight blocks per set by a 4-bit set

field and so on. The extreme conditions are one block per set (direct

mapping method) and 128 blocks per set (fully associative technique).

Each block must be provided with a valid bit where it indicates whether

the block contains valid data. When the main memory is loaded with

new programs and data from mass storage devices, valid bit of a

particular cache block is set to 1. It stays at 1 unless a main memory is

updated by a source that bypasses the cache. In this case, a check is

made to determine whether the block is currently in the cache. If it is, its

valid bit is set to 0.

Replacement algorithms

For any set associative mapping a replacement algorithm is needed. The

most common algorithms are discussed here. When a new block is to be

brought into the cache and all the positions that it may occupy are full, then

the cache controller must decide which of the old blocks to overwrite.

Because the programs usually stay in localized areas for a reasonable

period of time, there is high probability that the blocks that have been

referenced recently will be referenced again soon. Therefore when a block

is to be overwritten, the block that has not referenced for the longest time is

overwritten. This block is called least-recently-used block (LRU), and the

technique is called the LRU replacement algorithm. In order to use the LRU

algorithm, the cache controller must track the LRU block as computation

proceeds.

There are several replacement algorithms that require less overhead than

LRU method. One method is to remove the oldest block from a full set when



a new block must be brought in. This method is referred to as FIFO. In this

technique no updating is needed when hit occurs. However, because the

algorithm does not consider the recent patterns of access to blocks in the

cache, it is not effective as LRU approach in choosing the best block to

remove. There is another method called least frequently used (LFU) that

replaces that block in the set which has experienced the fewer references. It

is implemented by associating a counter with each slot. Yet another simplest

algorithm called random, is to choose a block to be overwritten in random.

10.3 External Memory

Magnetic Disk

A disk is a circular platter constructed of metal or of plastic coated with a

magnetic material. Data are recorded on and later retrieved from the disk

via a conducting coil named the head. During a read or write operation, the

head is stationary while the platter rotates beneath it. Writing is achieved by

producing a magnetic field which records a magnetic pattern on the

magnetic surface.

Data Organization and Formatting

Figure 10.9 depicts the data layout of disk. The head is capable of reading

or writing from a portion of the platter rotating beneath it. This gives rise to

organization of data on the platter in a concentric set of rings called Tracks.

Each track is the same width as the head. Adjacent tracks are separated by

gaps that minimize errors due to misalignment of head. Data is transferred

to and from the disk in blocks. And the block is smaller than the capacity of

a track. Data is stored in block regions which is an angular part of a track

and is referred to as a sector. Typically 10-100 sectors are there per track.

These may be either of fixed or variable length.



Figure 10.9: Disk Data Layout

Physical Characteristics

Head motion : Fixed-head disk (one / track) or Movable-head disk

(one/surface).

Disk portability : No removable disk vs. Movable disk.

Sides : Double-sided vs. single-sided.

Platters : Single platter vs. Multiple platter disks.

Head mechanism : Contact (floppy), Fixed gap, Aerodynamic gap

(Winchester [= hard disk]).

Disk Performance Parameters

1. Seek time: Time required to move the disk arm (head) to the required

track. s n . m Ts

Where Ts = estimated seek time, n = number of tracks traversed,

m = constant that depends on the disk drive, s = startup time.



2. Rotational delay: time required to rotate the disk to get wanted sector

beneath the head.

3. Transfer time: T = b / (r N)

Where T = transfer time, b = number of bytes to be transferred,

N = number of bytes on a track, r = rotation speed, in revolutions per

second.

4. Access time: Ta = total average access time.

Ta = Ts + (1 / 2 r) + (b / r N) where Ts = average seek time.

RAID

1. RAID is a set of physical disk drives viewed by the operating system as

a single logical drive.

2. Data are distributed across the physical drives of an array.

3. Redundant disk capacity is used to store parity information, which

guarantees data recoverability in case of a disk failure.

Optical Memory & Magnetic Tape are other two external memories.

10.4 Virtual memory

Virtual (or logical) memory is a concept that, when implemented by a

computer and its operating system, allows programmers to use a very large

range of memory or storage addresses for stored data. The computing

system maps the programmer's virtual addresses to real hardware storage

addresses. Usually, the programmer is freed from having to be concerned

about the availability of data storage.

In addition to managing the mapping of virtual storage addresses to real

storage addresses, a computer implementing virtual memory or storage also

manages storage swapping between active storage (RAM) and hard disk or

other high volume storage devices. Data is read in units called "pages" of

sizes ranging from a thousand bytes (actually 1,024 decimal bytes) up to



several megabytes in size. This reduces the amount of physical storage

access that is required and speeds up overall system performance.

The memory control circuitry translates the address specified by the

program into an address that can be used to access the physical memory.

This address is called logical address or virtual address. A set of virtual

addresses constitute the virtual address space.

The mechanism that translates virtual addresses into physical address is

usually implemented by a combination of hardware and software

components. If a virtual address refers to a part of the program or data

space that is currently in the physical memory, then the contents of the

appropriate physical location in the main memory are accessed. Otherwise

its contents must be brought into a suitable location in the main memory.

The mapping function is implemented by a special memory control unit

called as memory management unit. Mapping function can be changed

during the program execution according to the system requirement.

The simplest method of translation assumes that all programs and data are

composed of fixed length units called pages. Each page consists of a block

of words that occupy contiguous locations in the main memory or in the

secondary storage. Page normally ranges from 1K to 8K bytes in length.

They form the basic unit of information that is transmitted between main

memory and secondary storage devices.

Virtual address translation method

A virtual address translation method based on the concept of fixed length

pages is shown in Fig. 10.10. Each virtual address generated by the

processor is interpreted as a page number followed by a word number.

Information about the disk or the main memory is kept in a page table in the

main memory.



Fig. 10.10: Virtual memory address translation

Virtual address Physical address to

from CPU main memory

Block number if Page present in

Main memory

Pointer to

Pointer to secondary storage if page not present in the main memory

PAGE TABLE

(implemented in the main memory)

• • • • • •

Page table base register

Page number Word number Block number Word number

Control Bits Block number of Secondary storage pointer



The starting address of this table is kept in a page table base register. By

adding the page number to the contents of this register, the address of

corresponding entry in the page table is obtained. The content of this

location gives the starting address of the page if that page currently resides

in the main memory. Otherwise they indicate the location where the page is

to be found in the secondary storage. In this case the entry in the table

usually points to an area in the main memory where the secondary storage

address of the page is held. Each entry also includes some control bits to

describe the status of the page while it is in the main memory. One control

bit indicates whether the page has been modified when it was in the main

memory.

If the page table is stored in the main memory unit, then the two main

memory accesses must be made for every main memory access requested

by the program. This may result in a degradation of speed by a factor of two.

However a specialized cache memory is used in most of the systems to

speed up the translation process by storing recently used virtual to physical

address translation.

Virtual memory increases the effective size of the main memory. Only the

active space of the virtual address space is mapped onto locations in the

physical main memory, whereas the remaining virtual addresses are

mapped onto the bulk storage devices used. During a memory cycle the

addressing spacing mechanism (hardware or software) determines whether

the addressed information is in the physical main memory unit. If it is, the

proper information is accessed and the execution proceeds. If it is not, a

contiguous block of words containing the desired information are transferred

from the bulk storage to main memory, displacing some block that is

currently inactive.



10.5 Memory Management in Operating Systems

In virtual memory concept we assumed that only one large program is being

executed. If program and data does not fit into the physical main memory,

then a secondary storage should hold the overflow. The operating system

automatically swaps the programs between the main memory and

secondary storage.

Management routines are the parts of the operating system of the computer.

Virtual address space is divided into two parts system space and user space.

Operating system routines reside in system space and user application

programs reside in the user space.

In a multiuser environment each user will have a separate user space with a

separate page table. The memory management unit uses a page table base

register to determine the address of table to be used in the translation

process. Hence by changing the contents of this register operating system

can switch from one space to another. The physical main memory is thus

shared by the active pages of the system space and several user spaces.

However, only the pages that belong to one of these spaces are accessible

at any given time. In a multiuser environment, no program should be

allowed to modify either data or instructions of other programs in the main

memory. Hence protection should be given.

Such protection can be provided in several ways. Let us first consider the

most basic form of protection. Recall that in the simplest case the processor

has two states namely, the supervisor state and user state. The processor is

placed in the supervisor state while operating system routines are being

executed and in the user state to execute user programs. In the user state

some machine instructions cannot be executed. These privileged

instructions which include operations such as modifying the page table base

register cannot be executed in the user state. Hence a user program is



prevented from accessing the page tables of other user spaces or of the

system space.

It is sometimes desirable for one application program to have access to

certain pages belonging to another program. The operating system can

arrange this by causing these pages to appear in both spaces. The shared

pages will therefore have entries in two different page tables. The control

bits in each table entry can be set differently to control the Read/Write

access privileges granted to each program. The types of partitioning

methods are Fixed-size and unequal-size partitions and dynamic partitioning.

Paging is more efficient than partitioning. Programs are divided to "logical"

chunks known as "pages", assigned to available chunks of memory known

as "frames" or "page frames".

10.6 Summary

An intermediate memory called cache memory can be introduced between

main memory and CPU. We have introduced the reader to the principle

structure of cache memory along with the mapping functions and

replacement algorithms. Finally, we have touched upon a few external

memory elements, and another method to increase the effective size of the

memory is to introduce virtual memory


1. ____________ memory contains the copy of portions of main memory.

2. In ____________ design both CPU and main memory are directly

connected to the system bus.

3. Efficiency of the system that uses cache is ____________, when all the

references are confined to the cache.

4. Data are recorded on and later retrieved from the disk via a conducting

coil named ____________.

5. User application programs reside in the ____________.




1. Explain the replacement algorithms.

2. Discuss the physical characteristics of DISK.

3. Write a note about RAID.

10.8 Answers


1. cache

2. look-aside design

3. maximum

4. head

5. user space

Terminal Questions






Unit 11 Input / Output Basics

Structure:

11.1 Introduction

Objectives

11.2 External Devices

Classification of external devices

Input / Output problems

11.3 Input / Output Module

I/O Module Function

I/O Module Decisions

Input Output Techniques

11.4 Programmed I/O

I/O commands

I/O instructions

11.5 Interrupt Driven I/O

Basic concepts of an Interrupt

Response of CPU to an Interrupt

Design Issues

Priorities

Interrupt handling

Types of Interrupts

11.6 Summary


11.8 Answers

11.1 Introduction

In addition to the CPU and a set of memory modules, another key element

of a computer system is a set of I/O module. An I/O module is not just



mechanical connectors that wire a device into the system bus but it contains

some intelligence, that is, contains a logic for performing a communication

function between the peripheral and the bus. Owing to the following reason

we need an I/O module.

There are wide varieties of peripherals with a variety of operation

methods. It is impractical to incorporate the necessary logic within the

CPU to control a range of devices.

The data transfer rate of peripherals is often much slower than that of

the memory or CPU.

Peripherals often use different data formats and word lengths than the

computer system to which they are attached.

An I/O module has two major functions:

1) Interface to the CPU and memory via the system bus or central switch

2) Interface to one or more peripheral devices by tailored data links.

Objectives:

By the end of Unit 11, the learners should be able to:

1. Discuss the classification of external devices

2. Discuss I/O problems and functions of I/O modules

3. Explain the concept of programmed I/O.

4. Explain the various I/O instructions

5. Explain the concept of Interrupt driven I/O.

11.2 External Devices

I/O operations are accomplished through a wide assortment of external

devices that provide a means of exchanging data between the external

environment and the computer. An external device attaches to the computer

by a link to an I/O module as shown in figure 11.1. The link is used to

exchange control, status and data between the I/O module and the external



device. An external device connected to an I/O module is often referred to

as a peripheral device or simply a peripheral.

Figure 11.1: Generic Model of an I/O Module

Classification of External devices

External devices can be broadly classified into three categories:

1. Human readable: suitable for communicating with the computer user.

Examples: Screen, keyboard, video display terminals (VDT) and printers.

2. machine readable: suitable for communicating with equipments

Examples: magnetic disk & tape systems, Monitoring and control,

sensors and actuators which are used in robotics.

From functional point of view these devices are part of memory

hierarchy. But from structural point of view these devices are controlled

by I/O modules.

3. Communication: These devices allow a computer to exchange data with

remote devices, which may be machine readable or human readable.

Examples: Modem, Network Interface Card (NIC)

I/O Module

DataLines

Control Lines

Address Lines

System Bus

Links to Peripheral Devices



The nature of external devices and their interface is indicated in figure 11.2.

The interface of I/O module is in the form of control, status and data signals.

Data are in the form of a set of bits to be sent and received from the I/O

module. Control signals determine the function of the device, like INPUT or

READ to accept data from the I/O device, OUTPUT or WRITE to report

status, or perform some control particular to that device. Some signals

indicate the state of the device such as READY or NOTREADY to show

whether the device is ready for data transfer.

Control logic associated with the device controls the device operation in

response to the direction from the I/O module. The transducer converts the

data forms from electrical signals into other forms of energy or vice versa.

Typically a buffer is associated with the transducer to temporarily hold the

data being transferred between the I/O module and external devices.

Input / Output Problems

Wide variety of peripherals

Delivering different amounts of data

At different speeds

In different formats

All slower than CPU and RAM

Hence Need I/O modules

Control from I/O Module

Status to I/O Module

Data (bits) to and from I/O Module

Control Logic

Buffer

Transducer

Data (Device-unique) to and from environment

Figure 11.2: An External Device



11.3 Input / Output Module

It is the entity within a computer that is responsible for the control of one or

more external devices and for the exchange of data between those devices

and main memory and/or CPU.

Data Register

Status/Control Register

External Device Interface Logic

External Device Interface Logic

Input / Output Logic

Data Lines

Address Lines

Data Lines

Data

Status

Control

Data

Status

Control

Systems Bus Interface External Device Interface

Figure 11.3: I/O Module Diagram

The block diagram of an I/O Module is as shown in figure 11.3. Thus I/O

memory must have

Interface to CPU and Memory

Interface to one or more peripherals

I/O Module Function

The major functions or requirements for an I/O module fall into the following

five categories.

Control & Timing

CPU Communication

Device Communication

Data Buffering

Error Detection

During any period of time, the CPU may communicate with one or more

external devices in unpredictable patterns on the program’s need for I/O,

the internal resources, main memory and the CPU must be shared among



number of activities, including handling data I/O. Thus the I/O device

includes a control and timing requirement to coordinate the flow of traffic

between internal resources and external devices to the CPU. Thus CPU

might involve in sequence of operations like:

CPU checks I/O module device status

I/O module returns device status

If ready, CPU requests data transfer

I/O module gets data from device

I/O module transfers data to CPU

Variations for output, DMA, etc.

I/O module must have the capability to engage in communication with the

CPU and external device. Thus CPU communication involves

Command decoding: The I/O module accepts commands from the CPU

carried on the control bus.

Data: data are exchanged between the CPU and the I/O module over

data bus

Status reporting: Because peripherals are slow it is important to know

the status of I/O module. I/O module can report with the status signals.

Commonly used status signals are BUSY or READY. Various other

status signals may be used to report various error conditions.

Address recognition: just as each memory word has an address, there is

address associated with every I/O device. Thus I/O module must be

recognized with a unique address for each peripheral it controls.

The I/O module must also be able to perform device communication. This

communication involves commands, status information, and data. Some of

the essentials tasks are listed below:

Data buffering: the transfer rate into and out of main memory or CPU is

quite high, and the rate is much lower for most of the peripherals. The



table 11.1 gives the data rate of various devices. Data coming from main

memory are sent to an I/O module in a rapid burst. The data is buffered

in the I/O module and then sent to the peripheral device at its rate. In the

opposite direction data are buffered so as not to tie up the memory in a

slow transfer operation. Thus I/O module must be able to operate at

both device and memory speeds.

Table 11.1: data rates of various devices

Error detection: I/O module is often responsible for error detection and

subsequently reporting errors to the CPU. We categorize here two

classes of errors.

1. Mechanical and electrical malfunctions: These types of errors are

reported by the device such as paper jam, bad disk track, etc.



2. Unintentional change of bits pattern: As the data is transmitted from

device to I/O module unintentionally, bit patterns might change.

Some kind of error detecting code is often used to detect the

transmission errors.

Common example is to use a parity bit on each character of data. For

example ASCII character code occupies 7-bits of a byte. The eighth bit is

set so that the total number of 1’s in the byte is even for even parity or total

number of 1’s in the byte is odd for odd parity. Thus when a byte is received

an I/O module checks the parity to find whether any error has occurred.

I/O Module Decisions

Hide or reveal device properties like details of timing, formats to CPU.

Support multiple or single device.

Control device functions or leave it for CPU.

Also OS decisions e.g. Unix treats everything it can as a file.

Consider the problem of moving a character code from the keyboard to the

processor.

Striking a key, stores the corresponding character code in an 8-bit buffer

register associated with the keyboard. Let us call this register DATAIN, as

shown in Figure 11.4.

Figure 11.4: example of an I/O module



To inform the processor that a valid character is in DATAIN, a status control

flag, SIN, is set to 1. A program monitors SIN and when SIN is set to 1, the

processor reads the contents of DATAIN. When the character is transferred

to the processor, SIN is automatically cleared to 0. If a second character is

entered at the keyboard, SIN is again set to 1 and the process repeats.

An analogous process takes place when characters are transferred from the

processor to the display. A buffer register, DATAOUT and a status control

flag, SOUT are used for this transfer. When SOUT equals 1, the display is

ready to receive a character. Under program control, the processor monitors

SOUT, and when SOUT is set to 1, the processor transfers a character code

to DATAOUT. The transfer of a character to DATAOUT clears SOUT to 0;

when the display device is ready to receive a second character, SOUT is

again set to 1. The buffer registers DATAIN and DATAOUT and the status

flags SIN and SOUT are part of circuitry commonly known as a device

interface.

An I/O module that takes most of detailed processing burden, presenting to

a high level CPU, is usually referred to as I/O channel or I/O processor. An

I/O module that is primitive and requires detailed control is usually referred

to as an I/O controller or device controller. I/O controller is seen on

microcomputers, I/O channels on mainframes, with minicomputers

employing the mixture.

Input Output Techniques

Three techniques are possible for I/O operations. They are:

Programmed I/O

Interrupt driven

Direct Memory Access (DMA)

Table 11.2 indicates the relationship among the three techniques.



With Programmed I/O, data are exchanged between the CPU and the I/O

module. The CPU executes a program that gives it direct control of the I/O

operation, including sensing device status, sending a read or write

command and transferring data. When CPU issues a command to I/O

module, it must wait until I/O operation is complete. If the CPU is faster than

I/O module, there is wastage of CPU time.

With Interrupt driven I/O, the CPU issues a command to I/O module and it

does not wait until I/O operation is complete but instead continues to

execute other instructions. When I/O module has completed its work it

interrupts the CPU.

With both Programmed I/O and Interrupt driven I/O the CPU is

responsible for extracting data from main memory for output and storing

data in main memory for input.

Table 11.2: I/O Techniques

No Interrupts Use of Interrupts

I/O-to-memory transfer through processor

Programmed I/O Interrupt driven I/O

Direct I/O-to-memory transfer

Direct Memory Access (DMA)

11.4 Programmed I/O

When the CPU is executing a program and encounters an instruction

relating to I/O, it executes those instructions by issuing a command to the

appropriate I/O module. With this technique or using programmed I/O, the

I/O module will perform the requested action and then set appropriate bits in

the status register. The I/O module does not take any further action to alert

CPU that is it does not interrupt CPU. Hence it is the responsibility of the

CPU to periodically check the status of the I/O module until it finds that the

operation is complete.



The sequences of actions that take place with programmed I/O are:

CPU requests I/O operation

I/O module performs operation

I/O module sets status bits

CPU checks status bits periodically

I/O module does not inform CPU directly

I/O module does not interrupt CPU

CPU may wait or come back later

I/O commands

To execute an I/O related instruction, the CPU issues an address, specifying

the particular I/O module and external device and an I/O command. Four

types of I/O commands can be received by the I/O module when it is

addressed by the CPU. They are

A control command: is used to activate a peripheral and tell what to do.

Example: a magnetic tape may be directed to rewind or move forward a

record.

A test command: is used to test various status conditions associated

with an I/O module and its peripherals. The CPU wants to know the

interested peripheral for use. It also wants to know the most recent I/O

operation is completed and if any errors have occurred.

A read command: it causes the I/O module to obtain an item of data

from the peripheral and place it in an internal buffer. The CPU then gets

the data items by requesting I/O module to place it on the data bus.

A write command: it causes the I/O module to take an item of data from

the data bus and subsequently transmit the data item to the peripheral.

A flow chart for input of a block of data using programmed I/O is as shown in

figure 11.5. It reads in a block of data from a peripheral device into memory.



Fig. 11.5: input of block of data using programmed I/O

Data is read in one word at a time. For each word read in, the CPU keeps

checking the status until the word is available in I/O module’s data register.

That is the possible sequence of I/O commands and actions are:

CPU issues address

Identifies module (& device if >1 per module)

CPU issues command

Control - telling module what to do

e.g. spin up disk

Test - check status

e.g. power? Error?



Read/Write

Module transfers data via buffer from/to device

From the flow chart it is clear that the main disadvantage of this technique is

that it is a time consuming process that keeps the processor busy

unnecessarily.

I/O instructions

Addressing I/O Devices

Under programmed I/O data transfer is very like memory access (CPU

viewpoint)

Each device given unique identifier

CPU commands contain identifier (address)

I/O Mapping

When the CPU, main memory, and I/O module share a common bus, two

modes of addressing are possible.

1. Memory mapped I/O

Devices and memory share an address space

I/O looks just like memory read/write

No special commands for I/O

Large selection of memory access commands available

2. Isolated I/O

Separate address spaces

Need I/O or memory select lines

Special commands for I/O

Limited set

Consider an example: The processor can monitor the keyboard status flag

SIN and transfer a character from DATAIN to register R1 using the following

sequence of operations:



READWAIT Branch to READWAIT if SIN = 0

Input from DATAIN to R1

The Branch operation is usually implemented by two machine instructions.

The first instruction tests the status flag and the second performs the branch.

Although the details vary from computer to computer, the main idea is that

the processor monitors the status flag by executing a short wait loop and

proceeds to transfer the input data when SIN is set to 1 as a result of a key

being struck. The Input operation resets SIN to 0.

Another example is transferring output to the display. The sequence of

instructions is:

WRITEWAIT Branch to WRITEWAIT if SOUT = 0.

Output from R1 to DATAOUT

Again, the Branch operation is normally implemented by two machine

instructions. The wait loop is executed repeatedly until the status flag SOUT

is set to 1 by the display when it is free to receive a character. The Output

operation transfers a character from R1 to DATAOUT to be displayed, and it

clears SOUT to 0.

We assume that the initial state of SIN is 0 and the initial state of SOUT is 1.

This initialization is normally performed by the device control circuits when

the devices are placed under computer control before program execution

begins. Until now, we have assumed that the addresses issued by the

processor to access instructions and operands always refer to memory

locations. Many computers use an arrangement called memory-mapped I/O

in which some memory address values are used to refer to peripheral

device buffer registers, such as DATAIN and DATAOUT.

Thus, no special instructions are needed to access the contents of these

registers; data can be transferred between these registers and the



processor using instructions that we have already discussed, such as Move,

Load or Store. For example, the contents of the keyboard character buffer

DATAIN can be transferred to register R1 in the processor by the instruction

MoveByte DATAIN, R1

Similarly, the contents of register R1 can be transferred to DATAOUT by the

instruction MoveByte R1,DATAOUT The status flags SIN and SOUT are

automatically cleared when the buffer registers DATAIN and DATAOUT are

referenced, respectively. The MoveByte operation code signifies that the

operand size is a byte, to distinguish it from the operation code Move that

has been used for word operands. We have established that the two data

buffers may be addressed as if they were two memory locations. It is

possible to deal with the status flags SIN and SOUT in the same way, by

assigning them distinct addresses. However, it is more common to include

SIN and SOUT in device status registers, one for each of the two devices.

Let us assume that bit b3 in registers INSTATUS and OUTSTATUS

corresponds to SIN and SOUT, respectively. The read operation just

described may now be implemented by the machine instruction sequence

READWAIT Testbit #3, INSTATUS Branch=0 READWAIT MoveByte

DATAIN, R1.

The write operation may be implemented as WRITEWAIT Testbit #3,

OUTSTATUS.

Branch=0 WRITEWAIT

MoveByte R1, DATAOUT

The Testbit instruction tests the state of one bit in the destination location,

where the bit position to be tested is indicated by the first operand. If the bit

tested is equal to 0, then the condition of the branch instruction is true, and

a branch is made to the beginning of the wait loop. When the device is



ready, that is, when the bit tested becomes equal to 1, the data are read

from the input buffer or written into the output buffer.

The program shown in Figure 11.6 uses these two operations to read a line

of characters typed at a keyboard and send them out to a display device. As

the characters are read in, one by one, they are stored in a data area in the

memory and then echoed back out to the display. The program finishes

when the carriage return character, CR, is read, stored and sent to the

display. The address of the first byte location of the memory data area

where the line is to be stored is LOC. Register R0 is used to point to this

area, and it is initially loaded with the address LOC by the first instruction in

the program. R0 is incremented for each character read and displayed by

the Autoincrement addressing mode used in the Compare instruction.

Figure 11.6: Program for read from a keyboard and write output to a display

device

11.5 Interrupt Driven I/O

Using Program-controlled I/O requires continuous involvement of the

processor in the I/O activities. It is desirable to avoid wasting processor



execution time. An alternative is for the CPU to issue an I/O command to a

module and then go on other work. The I/O module will then interrupt the

CPU requesting service when it is ready to exchange data with the CPU.

The CPU will then execute the data transfer and then resumes its former

processing. Based on the use of interrupts, this techniques improves the

utilization of the processor.

An interrupt is more than a simple mechanism for coordinating I/O transfers.

In a general sense, interrupts enable transfer of control from one program to

another to be initiated by an event that is external to a computer. Execution

of the interrupted program resumes after completion of execution of the

interrupt service routine. The concept of interrupts is useful in operating

systems and in many control applications where processing of certain

routines has to be accurately timed relative to the external events. The latter

type of application is generally referred to as real-time processing.

The operations listed can be better understood with an example to input a

block of data. A flow chart using this technique for input of a block of data is

as shown in figure 11.7.



Figure 11.7: input block of data using interrupt driven I/O

Using Interrupt Driven I/O technique CPU issues read command. I/O

module gets data from peripheral while CPU does other work and I/O

module interrupts CPU, checks the status if no error, that is, the device is

ready, then CPU requests data and I/O module transfers data. Thus CPU

reads the data and stores it in the main memory.



Thus Interrupt Driven I/O

Overcomes CPU waiting.

No repeated CPU checking of device.

I/O module interrupts when ready.

Basic concepts of an Interrupt

An interrupt is an exception condition in a computer system caused by an

event external to the CPU. Interrupts are commonly used in I/O operations

by a device interface (or controller) to notify the CPU that it has completed

an I/O operation.

An interrupt is indicated by a signal sent by the device interface to the CPU

via an interrupt request line (on an external bus). This signal notifies the

CPU that the signaling interface needs to be serviced. The signal is held

until the CPU acknowledges or otherwise services the interface from which

the interrupt originated.

Note: interrupts may be generated by interfaces or devices which are not

involved in I/O. For example, if a computer system contains a clock (typically

called a "timer") outside of the CPU, that clock may signal its "ticks" by

interrupts. However, exception conditions or signals internal to the CPU,

such as occur when there is an attempt to execute a non-existent instruction,

are typically called traps, not interrupts.

Response of CPU to an Interrupt

The CPU checks periodically to determine if an interrupt signal is pending.

This check is usually done at the end of each instruction, although some

modern machines allow for interrupts to be checked for several times during

the execution of very long instructions.

When the CPU detects an interrupt, it then saves its current state (at least

the PC and the Processor Status Register containing condition codes); this



state information is usually saved in memory. After the interrupt has been

serviced, this state information is restored in the CPU and the previously

executing software resumes execution as if nothing had happened.

How is an Interrupt Serviced?

The CPU runs a program called an interrupt handler. Interrupt handler

software may be general and be able to service all device interfaces, but

typically an interrupt handler is designed to service an interface for exactly

one type of device (e.g., a terminal or a disk).

The interrupt handler must make it possible for the software which was

executing in the CPU and which was “interrupted" to resume its execution

after the interrupt is serviced as if nothing had happened. Since the CPU

hardware saves only a minimal amount of information when it saves the

state of the CPU, the interrupt handler is responsible for saving and

restoring data such as the contents of the general purpose registers which it

uses.

Design Issues

How do you identify the module issuing the interrupt?

How do you deal with multiple interrupts?

i.e. an interrupt handler being interrupted

Identifying Interrupting Module

When the CPU determines that an interrupt has occurred, it must be

determined what device interface signaled the interrupt. Two primary

approaches are used to do this. In some systems, a general interrupt

handler queries each device interface. When one acknowledges positively

that it originated the interrupt, then that device is serviced.

A more efficient alternative approach is to use vectored interrupts. When

vectored interrupts are used, the interface signaling the interrupt will



“identify" itself by sending information (its "vector") to the CPU. That

information will permit the CPU to select the correct interrupt handler. In

actuality, that information may be a pointer to where the starting address of

the interrupt handler is stored.

Multiple Interrupts

Each interrupt line has a priority

Higher priority lines can interrupt lower priority lines

If bus mastering only current master can interrupt

Priorities

Because several interrupts may be found pending when the CPU checks for

an interrupt, priorities may be used to determine in which order the

interrupts are serviced. When several interrupts are pending, the one of

highest priority is always serviced next. Priorities are assigned to device

interfaces on the basis of any of several factors.

Generally faster devices have higher priorities assigned to their interfaces;

disks transmit data at a faster rate than terminals so usually have higher

priorities. Also, if a device interface is not serviced in time, incoming data

may be written over by later data; thus devices such as sensors which do

not have a means to recover lost data (without manual intervention) may be

given fairly high priorities.

Priorities may be used to control the nesting of interrupts. In this case, the

CPU is itself assigned a priority depending on what is executing; typical user

programs usually run at the lowest priority while an interrupt handler may

run at the same priority which is assigned the corresponding interface which

it is servicing. Interrupts which carry a priority which is lower than (or equal

to) the current CPU priority are masked out - they are simply not noticeable

to the CPU. This allows "more important" device interfaces to interrupt the

servicing of "less important" device interfaces, while "less important"



interrupts must await the completion of servicing "more important" ones.

Furthermore, device interfaces can be prevented from interrupting their own

interrupt handlers. In addition, all interrupts can be blocked out by raising

the CPU priority level to its highest possible value; this would allow for a

critical section of instructions to be executed without “interruption." However,

this type of mechanism should only be used for very short segments of code

and for very good reasons (e.g., during context switches).

Priorities may be implemented by providing several interrupt request lines in

the external bus; each device is then attached to the line corresponding to

the interrupt priority level to which it has been assigned. Furthermore, if

several devices are attached to the same interrupt request line via daisy

chaining, the one closest to be CPU will be serviced first.

Interrupt Handling

We have already introduced hardware interrupts, by which a hardware

device can interrupt the normal operation of the CPU to signal that some

significant or time-critical event has occurred (e.g., a character arriving over

a serial line or an alarm event). The complete cycle of hardware and

software operations involved in a typical interrupt, from signaling to

resolution, are listed below:

Hardware Actions

The device controller asserts an interrupt line on the system bus to start

the interrupt sequence.

As soon as the CPU is prepared to handle interrupt, it asserts an

interrupt acknowledgement signal on the bus.

When the device controller sees that its interrupt signal has been

acknowledged it puts a small integer on the data lines to identify itself;

this is the "interrupt vector".



The CPU removes the interrupt vector from the bus and saves it

temporarily.

The CPU pushes the program counter and the Program Status Word

(PSW, or FLAGS register) onto the stack.

The CPU then locates a new program counter value using the interrupt

vector as an index into a (operating system) table at a well-defined point

in memory. This new program counter starts the interrupt service routine

(normally part of the device driver) for the device causing the interrupt.

Software Actions

The interrupt service routine saves all registers so that they can be

restored later. This may be on the stack or in a system table.

The exact device causing the interrupt is identified (e.g. by polling) -

each interrupt vector may in general be shared by a number of devices.

Any other information about the interrupt, e.g. status codes, can now be

read in.

If an I/O error has occurred it can be handed here.

The interrupt is responded to in a device specific manner. **

If necessary, a special code may be output to the device or interrupt

controller to indicate that the interrupt has been handled.

The saved register values are restored.

Execute a "Return from Interrupt" instruction, which restores the PC and

PSW from when the interrupt occurred.

The program which was interrupt continues executing from exactly

where it left off.

The step (**) above is where the "real" work will be done, e.g. reading in a

character from a serial line into a local buffer or signaling to the rest of the

program that some error or significant change in status has occurred.



Figure 11.8: Connections between Devices and Interrupt Controller actually

use the Bus Lines instead of dedicated Wires

Types of Interrupts

An interrupt is a event that causes the execution of one program to be

suspended and another program to be executed. So far we have assumed

that the occurrence of this event is caused by a request received by an I/O

device in the form of a hardware signal over the computer bus. In fact, there

are many uses for interrupts other than for controlling I/O transfers, some of

which we will now describe.

Recovery from errors

Computers use a variety of techniques to ensure that all hardware

components are operating properly. For example, many computers include

a parity check code in the main memory, which allows detection of errors in

the stored data. If an error occurs, the control hardware detects it and

informs the CPU by raising an interrupt. The CPU may also interrupt a

program if it detects an error or an unusual condition while executing the

instructions of this program. For example, the OP-code field of an instruction

may not correspond to any legal instruction, or some instruction may

attempt a division by zero.



When an interrupt is initiated as a result of such causes, the CPU proceeds

exactly in the same manner as in the case of an I/O interrupt request. It

suspends the program being executed and starts an interrupt service routine.

The interrupt service routine should take appropriate action to recover from

the error, if possible or inform the user about it.

Debugging

Another important use of interrupts is as an aid in debugging programs.

System software usually includes a program called a debugger, which helps

the programmer to find errors in the program. The debugger uses interrupts

to provide two important facilities: trace and break points. A trace facility

causes an interrupt to occur after execution of every instruction in a program

that is being debugged. The interrupt service routine for a trace interrupt

starts the execution of the debugging routine, which allows the user to

examine the contents of registers, memory locations and so on. On return

from the debugging routine, the next instruction is executed, then the

debugging routine is activated again. During the execution of the debugging

routine trace interrupts are disabled.

Break points provide a similar facility, except that the program being

debugged is interrupted at only specific points selected by the user. An

instruction called trap or software interrupt is usually provided for this

purpose. Execution of this instruction results in exactly the same actions as

when a hardware interrupt request is received. Thus if the user wishes to

interrupt a program after execution of the instruction i the debugging routine

replaces instruction i+1 with a software interrupt instruction. When the

program is executed and reaches that point, it is interrupted and the

debugging routine is activated. Later, when the user is ready to continue

executing the program being debugged, the debugging routine restores the

instruction that was at location i+1 and executes a Return-from-interrupt

instruction.



Communication between Programs

Software interrupt instructions are used by the operating system to

communicate with and control the execution of other programs.

11.6 Summary

An I/O module is a key element of a computer system. An I/O module has

two major functions: first one to interface with the CPU and memory via the

system bus or central switch and second to interface with one or more

peripheral devices by tailored data links. Basic I/O operations and

Programmed I/O, Interrupt-driven I/O, Direct memory access (DMA)

techniques are also discussed.

Programmed I/O does not interrupt the processor. The processor must

periodically check the status of the I/O module until the task is completed.

Showing how characters are transferred between the processor and

keyboard and display devices. Interrupt-driven I/O allows the processor to

do other tasks while waiting for the I/O device to send an interrupt. Interrupt-

driven I/O is more efficient than Programmed I/O.


1. External devices can be broadly classified into ___________ categories.

2. ___________ is a device that allows a computer to exchange data with

remote devices.

3. The data rate of a mouse is ___________.

4. In ___________ I/O, the I/O module doesn’t interrupt the CPU.

5. In ___________, devices and memory share an address space.




1. List the classification of external devices.

2. Explain I/O module in detail.

3. Explain the concept of programmed I/O.

4. Discuss the working principle of an interrupt.

5. Explain how CPU responses to an interrupt.

11.8 Answers


1. two

2. modem or NIC

3. 100 bytes / sec

4. programmed

5. memory mapped I/O

Terminal Questions








Unit 12 Direct Memory Access

Structure:

12.1 Introduction

Objectives

12.2 Direct Memory Access

DMA Function and Operation

DMA Configurations

12.3 DMA Controller

DMA Transfer Types

DMA Transfer modes

DMA Controller Operation

Advantages

12.4 Synchronization Requirements for DMA and Interrupts

Synchronization with Interrupts

Synchronization with DMA

12.5 Summary


12.7 Answers

12.1 Introduction

Direct Memory Access capabilities are provided by some computer bus

architectures that allow data to be sent directly from an attached device

(such as a disk drive) to the memory on the computer's motherboard. The

microprocessor is freed from involvement with the data transfer, thus

speeding up overall computer operation.

Objectives:

By the end of Unit 12, the learners should be able to explain the DMA

function, operation and configurations



12.2 Direct Memory Access

Interrupt driven and programmed I/O require active CPU intervention.

Transfer rate is limited.

CPU is tied up.

DMA is the answer.

Direct Memory Access capabilities are provided by some computer bus

architectures that allow data to be sent directly from an attached device

(such as a disk drive) to the memory on the computer's motherboard. The

microprocessor is freed from involvement with the data transfer, thus

speeding up overall computer operation.

Usually a specified portion of memory is designated as an area to be used

for direct memory access. In the ISA bus standard, up to 16 megabytes of

memory can be addressed for DMA. The EISA and Micro Channel

Architecture Standards allow access to the full range of memory addresses

(assuming they're addressable with 32-bits). Peripheral Component

Interconnect accomplishes DMA by using a bus master (with the

microprocessor "delegating" I/O control to the PCI controller).

DMA Function and Operation

The DMA module is as shown in figure 12.1, which is capable of mimicking

the CPU.



Figure 12.1: Typical DMA block diagram

Consider an example of input of a block of data using DMA access. The flow

chart is given in figure 12.2.

Figure 12.2: input block of data using DMA access



When the CPU wishes to read or write a block of data, it issues a command

to the DMA module and gives the following information:

CPU tells DMA controller:-

Read/Write

Device address

Starting address of memory block for data

Amount of data to be transferred

The CPU carries on with other work.

Thus DMA controller steals the CPU’s work of I/O operation.

The DMA module transfers the entire block of data,

One word at a time, directly to or from memory, without going through

CPU.

When the transfer is complete

DMA controller sends interrupt when finished

Thus CPU is involved only at the beginning and at the end of the transfer.

DMA Configurations

The DMA mechanism can be configured in a variety of ways. Some of the

common configurations are discussed here.

1. Single Bus Detached DMA

In this configuration all modules share the same system bus. The block

diagram of a single bus detached DMA is as shown in figure 12.3. The DMA

module that is mimicking the CPU uses the programmed I/O to exchange

the data between the memory and the I/O module through the DMA module.

This scheme may be inexpensive but is clearly inefficient. The number of

bus cycles is quite substantial.



Features of this configuration are:

Single Bus, Detached DMA controller

Each transfer uses bus twice

I/O to DMA then DMA to memory

CPU is suspended twice

Figure 12.3: Block diagram of single bus detached DMA

2. Single Bus, integrated DMA

Here there is a path between DMA module and one or more I/O modules

that do not include the system bus. The block diagram of single bus

Integrated DMA is as shown in figure 12.4. The DMA logic can actually be

considered as a part of an I/O module or there may be a separate module

that controls one more I/O modules.

Figure 12.4: Block diagram of single bus integrated DMA

The features of this configuration can be considered as:

Single Bus, Integrated DMA controller

Controller may support >1 device

Each transfer uses bus once

DMA to memory

CPU is suspended once

CPUDMA

Controller

I/O

Device

I/O

Device

Main

Memory

CPUDMA

Controller

I/O

Device

I/O

Device

Main

Memory

DMA

Controller

I/O

Device



3. DMA using an I/O bus

One further step of the concept of integrated DMA is to connect I/O modules

to DMA controller using a separate bus called I/O bus. This reduces the

number of I/O interfaces in the DMA module to one and provides for an

easily expandable configuration. The block diagram of DMA using I/O bus is

as shown in figure 12.5. Here the system bus that the DMA shares with

CPU and main memory is used by DMA module only to exchange data with

memory. And the exchange of data between the DMA module and the I/O

modules takes place, off the system bus that is through the I/O bus.

Figure 12.5: Block diagram of DMA with I/O bus and system bus

The features of this configuration are:

Separate I/O Bus

Bus supports all DMA enabled devices

Each transfer uses bus once

DMA to memory

CPU is suspended once

12.3 DMA Controller

DMA has been a built-in feature of PC architecture since the introduction of

the original IBM PC. PC-based DMA was used for floppy disk I/O in the

original PC and for hard disk I/O in later versions. PC-based DMA

CPU DMA

Controller

I/O

Device

I/O

Device

Main

Memory

I/O

Device

I/O

Device

I/O bus

system bus



technology, along with high-speed bus technology, is driven by data storage,

communications and graphics needs – all of which require the highest rates

of data transfer between system memory and I/O devices. Data acquisition

applications have the same needs and therefore can take advantage of the

technology developed for larger markets. This section introduces DMA

controller terminology and explains the basic operation of a PC - based

DMA controller along with common modes of operation. Key terminology is

italicized.

A DMA controller is a device, usually peripheral to a computer's CPU, that is,

programmed to perform a sequence of data transfers on behalf of the CPU.

A DMA controller can directly access memory and is used to transfer data

from one memory location to another, or from an I/O device to memory and

vice versa. A DMA controller manages several DMA channels, each of

which can be programmed to perform a sequence of these DMA transfers.

Devices, usually I/O peripherals, that acquire data that must be read (or

devices that must output data and be written to) signal the DMA controller to

perform a DMA transfer by asserting a hardware DMA request signal. A

DMA request signal for each channel is routed to the DMA controller. This

signal is monitored and responded to in much the same way as a processor

handles interrupts. When the DMA controller sees a DMA request, the DMA

controller responds by performing one or many data transfers from that I/O

device into system memory or vice versa. Channels must be enabled by the

processor for the DMA controller to respond to DMA requests. The number

of transfers performed, transfer modes used and memory locations

accessed depend on how the DMA channel is programmed.

A DMA controller typically shares the system memory and I/O bus with the

CPU and has both bus master and slave capability. Figure 12.6 shows the

DMA controller architecture and how the DMA controller interacts with the

CPU. In bus master mode, the DMA controller acquires the system bus



(address, data and control lines) from the CPU to perform the DMA transfers.

Because the CPU releases the system bus for the duration of the transfer,

the process is sometimes referred to as cycle stealing. However, this term is

no longer appropriate in the context of the new high-performance

architectures that are appearing in personal computers. These architectures

use cache memory for the CPU, which enables the DMA controller to

operate in parallel with the CPU to some extent.

Figure 12.6: DMA controller



In bus slave mode, the DMA controller is accessed by the CPU, which

programs the DMA controller's internal registers to set up DMA transfers.

The internal registers consist of source and destination address registers

and transfer count registers for each DMA channel, as well as control and

status registers for initiating, monitoring and sustaining the operation of the

DMA controller.

DMA Transfer Types

DMA controllers vary as to the type of DMA transfers and the number of

DMA channels they support. The two types of DMA transfers are flyby DMA

transfers and fetch-and-deposit DMA transfers. The three common transfer

modes are single, block, and demand transfer modes. These DMA transfer

types and modes are described in the following sections.

1. Flyby DMA transfer:

It is the fastest DMA transfer type and is also referred to as a single-cycle,

single-address. In a flyby DMA transfer, a single bus operation is used to

accomplish the transfer, with data read from the source and written the

destination simultaneously. In flyby operation, the device requesting service

asserts a DMA request on the appropriate channel request line of the DMA

controller. The DMA controller responds by gaining control of the system

bus from the CPU and then issuing the pre-programmed memory address.

Simultaneously, the DMA controller sends a DMA acknowledge signal to the

requesting device. This signal alerts the requesting device to drive the data

onto the system data bus or to latch the data from the system bus,

depending on the direction of the transfer.

In other words, a flyby DMA transfer looks like a memory read or write cycle

with the DMA controller supplying the address and the I/O device reading or

writing the data. Because flyby DMA transfers involve a single memory cycle

per data transfer, these transfers are very efficient; however, memory-to-



memory transfers are not possible in this mode. Figure 12.7 shows the flyby

DMA transfer signal protocol.

Figure 12.7: Flyby DMA

2. Fetch-and-deposit DMA transfer:

The second type of DMA transfer is referred to as a dual-cycle, dual-

address, flow-through, or fetch-and-deposit DMA transfer. As these names

imply, this type of transfer involves two memory or I/O cycles. The data

being transferred is first read from the I/O device or memory into a

temporary data register internal to the DMA controller. The data is then

written to the memory or I/O device in the next cycle. Figure 12.8 shows the

fetch-and-deposit DMA transfer signal protocol. Although inefficient because

the DMA controller performs two cycles and thus retains the system bus

longer, this type of transfer is useful for interfacing devices with different

data bus sizes.

For example, a DMA controller can perform two 16-bit read operations from

one location followed by a 32-bit write operation to another location. A DMA

controller supporting this type of transfer has two address registers per

channel (source address and destination address) and bus-size registers, in



addition to the usual transfer count and control registers. Unlike the flyby

operation, this type of DMA transfer is suitable for both memory-to-memory

and I/O transfers.

Figure 12.8: fetch and deposit transfer

DMA Transfer Modes

In addition to DMA transfer types there are different transfer modes. Most

common transfer modes are single, block and demand modes.

Single transfer mode

Single transfer mode transfers one data value for each DMA request

assertion. This mode is the slowest method of transfer because it requires

the DMA controller to arbitrate for the system bus with each transfer. This

arbitration is not a major problem on a lightly loaded bus, but it can lead to

latency problems when multiple devices are using the bus.

Block transfer mode

For block mode transfers, the DMA controller performs the entire DMA

sequence as specified by the transfer count register at the fastest possible

rate in response to a single DMA request from the I/O device.



Demand transfer mode

For demand mode transfers, the DMA controller performs DMA transfers at

the fastest possible rate as long as the I/O device asserts its DMA request.

When the I/O device unasserts this DMA request, transfers are held off.

Block and demand transfer modes increase system throughput by allowing

the DMA controller to perform multiple DMA transfers when the DMA

controller has gained the bus.

DMA Controller Operation

For each channel, the DMA controller saves the programmed address and

count in the base registers and maintains copies of the information in the

current address and current count registers, as shown in Figure 12.6 each

DMA channel is enabled and disabled via a DMA mask register. When DMA

is started by writing to the base registers and enabling the DMA channel, the

current registers are loaded from the base registers. With each DMA

transfer, the value in the current address register is driven onto the address

bus, and the current address register is automatically incremented or

decremented.

The current count register determines the number of transfers remaining

and is automatically decremented after each transfer. When the value in the

current count register goes from 0 to -1, a terminal count (TC) signal is

generated, which signifies the completion of the DMA transfer sequence.

This termination event is referred to as reaching terminal count. DMA

controllers often generate a hardware TC pulse during the last cycle of a

DMA transfer sequence. This signal can be monitored by the I/O devices

participating in the DMA transfers.

DMA controllers require reprogramming when a DMA channel reaches TC.

Thus, DMA controllers require some CPU time, but far less than is required

for the CPU to service device I/O interrupts. When a DMA channel reaches



TC, the processor may need to reprogram the controller for additional DMA

transfers. Some DMA controllers interrupt the processor whenever a

channel terminates. DMA controllers also have mechanisms for

automatically reprogramming a DMA channel when the DMA transfer

sequence completes. These mechanisms include auto initialization and

buffer chaining.

The auto initialization feature repeats the DMA transfer sequence by

reloading the DMA channel's current registers from the base registers at the

end of a DMA sequence and re-enabling the channel. Buffer chaining is

useful for transferring blocks of data into non-contiguous buffer areas or for

handling double-buffered data acquisition. With buffer chaining, a channel

interrupts the CPU and is programmed with the next address and count

parameters while DMA transfers are being performed on the current buffer.

Some DMA controllers minimize CPU intervention further by having a chain

address register that points to a chain control table in memory. The DMA

controller then loads its own channel parameters from memory. Generally,

the more sophisticated the DMA controller, the less servicing the CPU has

to perform.

A DMA controller has one or more status registers that are read by the CPU

to determine the state of each DMA channel. The status register typically

indicates whether a DMA request is asserted on a channel and whether a

channel has reached TC. Reading the status register often clears the

terminal count information in the register, which leads to problems when

multiple programs are trying to use different DMA channels.

Advantages

DMA has several advantages over polling and interrupts. DMA is fast

because a dedicated piece of hardware transfers data from one computer

location to another and only one or two bus read/write cycles are required



per piece of data transferred. In addition, DMA is usually required to achieve

maximum data transfer speed, and thus is useful for high speed data

acquisition devices. DMA also minimizes latency in servicing a data

acquisition device because the dedicated hardware responds more quickly

than interrupts and transfer time is short. Minimizing latency reduces the

amount of temporary storage (memory) required on an I/O device. DMA also

off-loads the processor, which means the processor does not have to

execute any instructions to transfer data. Therefore, the processor is not

used for handling the data transfer activity and is available for other

processing activity. Also, in systems where the processor primarily operates

out of its cache, data transfer is actually occurring in parallel, thus increasing

overall system utilization.

12.4 Synchronization Requirements for DMA and Interrupts

Many times software designers have to work with data structures that are

shared with interrupts or DMA devices. This requires performing atomic

updates to the shared critical regions.

Synchronization with Interrupts

When a data structure is shared with an ISR, disabling the interrupt to

execute the critical region updates is a good technique. Keep in mind that

disabling of interrupts should be restricted to only the code that updates the

critical region. Keeping the interrupts disabled for a long time will increase

the interrupt latency.

Another option is to make use of the fact that interrupts are processed at

instruction boundaries. A single instruction that performs read as well as

write could be used to perform an atomic transaction. For example, if your

processor supports direct memory increment, you could increment a shared

semaphore without disabling interrupts.



Synchronization with DMA

Sharing data structures with a DMA device is tricky. The processor can

initiate a DMA operation at a bus cycle boundary. This means that a new

DMA operation can be started in the middle of an instruction execution

(Keep in mind that an instruction execution involves multiple bus cycles).

The best mechanism to perform critical region updates is to use the read-

modify-write bus cycle. With this instruction, atomic updates can be made to

critical regions as the read and write are glued together in a special bus

cycle.

Another option is to disable DMA operation. Extreme caution should be

used when employing these techniques. Some processors also support

disabling DMA operations by using locked bus cycles. The processor could

execute lock instruction to disable external bus grants. When critical region

updates have been completed, the unlock instruction is used to allow bus

grants.

Another mechanism to prevent DMA might be to temporarily disable the

device that will perform DMA. For example, if the DMA operations are being

performed by an Ethernet controller, disabling the Ethernet controller will

make sure no DMA operations are started when a critical region update is

being made.

12.5 Summary

DMA is a key element of a computer system. An I/O module has two major

functions: first one to interface with the CPU and memory via the system

bus or central switch and second to interface with one or more peripheral

devices by tailored data links. Basic I/O operations and Programmed I/O,

Interrupt-driven I/O, Direct memory access (DMA) techniques are also

discussed.



Programmed I/O does not interrupt the processor. The processor must

periodically check the status of the I/O module until the task is completed.

Showing how characters are transferred between the processor and

keyboard and display devices. Interrupt-driven I/O allows the processor to

do other tasks while waiting for the I/O device to send an interrupt. Interrupt-

driven I/O is more efficient than Programmed I/O. Direct memory access

(DMA) allows a separate system bus or I/O module to directly perform a

data transfer while freeing the processor to perform tasks. DMA sends an

interrupt to the processor only when the task is done.

Finally we have seen the synchronization requirements for DMA and

interrupts also.


1. In the ISA bus standard, up to ____________ of memory can be

addressed for DMA.

2. In ____________ mechanism, CPU is suspended twice.

3. A ____________ is a device, usually peripheral to a computer’s CPU

that is programmed to perform a sequence of data transfers on behalf of

the CPU.

4. Channels must be enabled by the ____________ for the DMA controller

to respond to DMA requests.

5. Keeping the interrupts disabled for a long time will increase the

____________.


1. Discuss in detail the functions and operations of DMA.

2. Explain the operation of DMA controller.

3. Explain the synchronization requirements for DMA and interrupts.



12.7 Answers


1. 16 megabytes

2. single bus detached DMA

3. DMA controller

4. processor

5. interrupt latency

Terminal Questions:






References:

1. Computer System Design & Architecture, H. Jordan, PHI

2. William Stalling,” Computer Organization and Architecture", Prentice

Hall/ Person Education Asia

3. John P. Hayes, “Computer Architecture and Organization", McGraw Hill,

4. Tannenbaum, "Computer Organization", PHI

5. V. Carl Hamacher and Zaky, "Computer Organization", McGraw Hill.

6. Thomas C. Bartee, “Computer Architecture and Logic Design", Tata

McGraw Hill.

7. Moris Mano, "Computer System Architecture", Prentice Hall of India,

Second Edition

8. Advanced Computer, Architecture Parallelism, Scalability and

Programmability, Kai Hwang, PHI

––––––––––––––––––––

BT 0068 Computer Organization and Architecture...

Documents

Transcript of BT 0068 Computer Organization and Architecture...