BT 0068 Computer Organization and Architecture...
Transcript of BT 0068 Computer Organization and Architecture...
BT 0068
Computer Organization and Architecture
Contents
Unit 1
Data Representation in Computers 1
Unit 2
Register Transfer and Microoperations 37
Unit 3
Basic Structure of a Digital Computer 73
Unit 4
CPU and Register Organization 99
Unit 5
Interconnection Structures 122
Unit 6
Instruction Sets: Addressing Modes and Formats 139
Unit 7
Arithmetic Logic Unit 173
Unit 8
Binary Arithmetic 183
Unit 9
Memory Unit – Part I 220
Unit 10
Memory Unit – Part II 245
Unit 11
Input / Output Basics 273
Unit 12
Direct Memory Access 301
References 318
Prof. V. B. Nanda Gopal Director & Dean Directorate of Distance Education Sikkim Manipal University of Health, Medical & Technological Sciences (SMU DDE)
Board of Studies Dr. U. B. Pavanaja (Chairman) Nirmal Kumar Nigam General Manager – Academics HOP- IT Manipal Universal Learning Pvt. Ltd. Sikkim Manipal University – DDE Bangalore. Manipal. Prof. Bhushan Patwardhan Dr. A. Kumaran Chief Academics Research Manager (Multilingual) Manipal Education Microsoft Research Labs India Bangalore. Bangalore. Dr. Harishchandra Hebbar Ravindranath.P. S. Director Director (Quality) Manipal Centre for Info. Sciences. Yahoo India Manipal. Bangalore. Dr. N. V. Subba Reddy Dr. Ashok Kallarakkal HOD-CSE Vice President Manipal Institute of Technology IBM India Manipal. Bangalore. Dr. Ashok Hegde H. Hiriyannaiah Vice President Group Manager MindTree Consulting Ltd EDS Mphasis Bangalore. Bangalore. Dr. Ramprasad Varadachar Director, Computer Studies Dayanand Sagar College of Engg. Bangalore.
Content Preparation Team Content Writing Content Editing Mr. Balasubramani R Dr. E. R. Naganathan Assistant Professor, Dept. of IT Professor & HOD – IT Sikkim Manipal University – DDE Sikkim Manipal University – DDE Manipal. Manipal. Language Editing Ms. Chandrika P.S. HOD – English Sharada P.U. College, Mangalore
Edition: Spring 2009
This book is a distance education module comprising a collection of learning material for our students. All rights reserved. No part of this work may be reproduced in any form by any means without permission in writing from Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim. Printed and published on behalf of Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim by Mr.Rajkumar Mascreen, GM, Manipal Universal Learning Pvt. Ltd., Manipal – 576 104. Printed at Manipal Press Limited, Manipal.
The goal of this book is to introduce the basic concepts of computer
architecture and organization, in order to allow computer scientists to
recognize when programs are not as efficient as they could be and to
transform them so that they make better use of the underlying machine.
Another, closely related, goal is to provide the necessary background in
computer architecture to evaluate competing algorithms to decide which is
likely to be the most efficient for a given machine, even before they are
expressed in a programming language.
Unit 1: This unit discusses basic building blocks of a digital computer and
different data representation in computers. This unit also briefly explains
different codes.
Unit 2: This unit explains the Register Transfer Language. It also gives an
overview of bus and memory transfers. It also explains different
microoperations in detail.
Unit 3: This unit discusses Von-Neumann architecture of computer. It also
provides input on early evolution of computers as well.
Unit 4: This unit provides a detailed coverage on register organization of a
basic computer with special reference to register organization of Intel 8085
microprocessor and Motorola and Zilog machines.
Unit 5: This unit discusses the data transfers between the different modules
of a system like I/O module, Memory module etc. It explains the structure,
elements and functions of an interconnection entity which is called as Bus.
Here we also study different structure of CPU with single bus or two bus
systems.
SUBJECT INTRODUCTION SUBJECT INTRODUCTION
Unit 6: This unit explains instruction characteristics, type and different data
types that are supported by different machines like VAX, IBM etc. Then it
covers different operations like data transfer, arithmetic and logical
operations and transfer of control operations, with examples. It also covers
different modes of addressing with the formats and lengths of the
instruction. Finally it explains the stacks and subroutines that are necessary
for execution of certain instructions like transfer of control etc.
Unit 7: This unit discusses the main entity of the CPU that is arithmetic logic
unit. It also introduces different number representations.
Unit 8: This unit discusses different operations like addition, subtraction,
multiplication and division of numbers.
Unit 9: This unit discusses the Memory Unit, dealing with internal and
external memory. It also deals with the system memory considerations,
structures and principle of the cache memory.
Unit 10: We touch upon, in brief, the virtual memory and memory
management in operating system as well.
Unit 11: This unit discusses the Input Output that is responsible for
interaction with the peripherals. This unit also discusses the external
devices and different I/O functions. It explains in detail the programmed and
interrupts driven I/O techniques.
Unit 12: This unit touches upon the basic understanding of direct memory
access. Here we also give the detailed working of DMA controller and its
synchronization requirements with interrupts.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 1
Unit 1 Data Representation in Computers
Structure:
1.1 Introduction
Objectives
1.2 Digital Computers
1.3 Data Types
1.4 Complements
1.5 Fixed-Point Representation
1.6 Floating-Point Representation
1.7 Other Binary Codes
1.8 Summary
1.9 Terminal Questions
1.10 Answers
1.1 Introduction
The digital computer is a digital system that performs various computational
tasks. The word digital implies that the information in the computer is
represented by variables that take a limited number of discrete values.
These values are processed internally by components that can maintain a
limited number of discrete states. The decimal digits 0, 1, 2, 9, for example,
provide 10 discrete values. The first electronic digital computers, developed
in the late 1940s, were used primarily for numerical computations. In this
case the discrete elements are the digits. From this application the term
digital computer has emerged. In practice, digital computers function more
reliably only if two states are used. Because of the physical restriction of
components, and because human logic tends to be binary (i.e. true-or-false,
yes-or-no statements), digital components that are constrained to take
discrete values are further constrained to take only two values and are said
to be binary.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 2
Objectives:
After studying this unit, the learner will be able to
Explain various units of a digital computer
Understand different data types
Explain fixed and floating point number representation
Discuss various binary and error detection codes
1.2 Digital Computers
Digital computers use the binary number system, which has two digits:
0 and 1. A binary digit is called a bit. Information is represented in digital
computers in groups of bits. By using various coding techniques, groups of
bits can be made to represent not only binary numbers but also other
discrete symbols, such as decimal digits or letters of the alphabet. By the
judicious use of binary arrangements and by using various coding
techniques, the groups of bits are used to develop complete sets of
instructions for performing various types of computations.
In contrast to the common decimal numbers that employ the base 10
system, binary numbers use a base 2 system with two digits: 0 and 1. The
decimal equivalent of a binary number can be found by expanding it into a
power series with a base of 2. For example, the binary number 1001011
represents a quantity that can be converted to a decimal number by
multiplying each bit by the base 2 raised to an integer power as follows:
1 x 26 + 0 x 25 + 0 x 24 + 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20 = 75
The seven bits 1001011 represent a binary number whose decimal
equivalent is 75. However, this same group of seven bits represents the
letter K when used in conjunction with a binary code for the letters of the
alphabet. It may also represent a control code for specifying some decision
logic in a particular digital computer. In other words, groups of bits in a
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 3
digital computer are used to represent many different things. This is similar
to the concept that the same letters of an alphabet are used to construct
different languages, such as English and French.
A computer system is sometimes subdivided into two functional entities:
hardware and software. The hardware of the computer consists of all the
electronic components and electromechanical devices that comprise the
physical entity of the device. Computer software consists of the instructions
and the data that the computer manipulates to perform various data-
processing tasks. A sequence of instructions for the computer is called a
program. The data that are manipulated by the program constitute the data
base.
A computer system is composed of its hardware and the system software
available for its use. The system software of a computer consists of a
collection of programs whose purpose is to make more effective use of the
computer. The programs included in a systems software package are
referred to as the operating system. They are distinguished from
application programs written by the user for the purpose of solving particular
problems. For example, a high-level language program written by a user to
solve particular data-processing needs is an application program, but the
compiler that translates the high-level language program to machine
language is a system program. The customer who buys a computer
system would need, in addition to the hardware, any available software
needed for effective operation of the computer. The system software is an
indispensable part of a total computer system. Its function is to compensate
for the differences that exist between user needs and the capability of the
hardware.
The hardware of the computer is usually divided into three major parts as
shown in Fig. 1.1.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 4
Fig. 1.1: Block diagram of a digital computer
The Central Processing Unit (CPU) contains an Arithmetic and Logic
Unit (ALU) for manipulating data, a number of registers for storing data, and
control circuits for fetching and executing instructions. The memory of a
computer contains storage for instructions and data. It is called a Random
Access Memory (RAM) because the CPU can access any location in
memory at random and retrieve the binary information within a fixed interval
of time. The Input-Output Processor (IOP) contains electronic circuits for
communicating and controlling the transfer of information between the
computer and the outside world. The input and output devices connected to
the computer include keyboards, printers, terminals, magnetic disk drives,
and other communication devices.
This book provides the basic knowledge necessary to understand the
hardware operations of a computer system. The subject is sometimes
considered from three different points of view, depending on the interest of
the investigator. When dealing with computer hardware it is customary to
Random Access Memory (RAM)
Central Processing Unit (CPU)
Input Output Processor (IOP)
Input devices
Output devices
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 5
distinguish between what is referred to as computer organization, computer
design, and computer architecture.
Computer organization is concerned with the way the hardware
components operate and the way they are connected together to form the
computer system. The various components are assumed to be in place and
the task is to investigate the organizational structure to verify that the
computer parts operate as intended.
Computer design is concerned with the hardware design of the computer.
Once the computer specifications are formulated, it is the task of the
designer to develop hardware for the system. Computer design is
concerned with the determination of what hardware should be used and how
the parts should be connected. This aspect of computer hardware is
sometimes referred to as computer implementation.
Computer architecture is concerned with the structure and behavior of the
computers as seen by the user. It includes the information formats, the
instruction set, and techniques for addressing memory. The architectural
design of a computer system is concerned with the specifications of the
various functional modules, such as processors and memories, and
structuring them together into a computer system.
1.3 Data Types
Binary information in digital computers is stored in memory or processor
registers. Registers contain either data or control information. Control
information is a bit or a group of bits used to specify the sequence of
command signals needed for manipulation of the data in other registers.
Data are numbers and other binary-coded information that are operated on,
to achieve required computational results.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 6
The data types found in the registers of digital computers may be classified
as being one of the following categories: (1) numbers used in arithmetic
computations, (2) letters of the alphabet used in data processing, and
(3) other discrete symbols used for specific purposes. All types of data,
except binary numbers, are represented in computer registers in binary-
coded form. This is because registers are made up of flip-flops and flip-
flops are two-state devices that can store only 1’s and 0’s. The binary
number system is the most natural system to be used in a digital computer.
But sometimes it is convenient to employ different number systems,
especially the decimal number system, since it is used by people to perform
arithmetic computations.
Number Systems
A number system of base, or radix, r is a system that uses distinct symbols
for r digits. Numbers are represented by a string of digit symbols. To
determine the quantity that the number represents, it is necessary to
multiply each digit by an integer power of r and then form the sum of all
weighted digits. For example, the decimal number system in everyday
use employs the radix 10 system. The 10 symbols are 0, 1, 2, 3, 4, 5, 6, 7,
8, and 9. The string of digits 724.5 is interpreted to represent the quantity
7 x 102 + 2 x 101 + 4 x 100 + 5 x 10-1
that is, 7 hundreds, plus 2 tens, plus 4 units, plus 5 tenths. Every decimal
number can be similarly interpreted to find the quantity it represents.
The binary number system uses the radix 2. The two digit symbols used
are 0 and 1. The string of digits 101101 is interpreted to represent the
quantity
1 x 25 + 0 x 24 + 1 x 23 + 1 x 22 + 0 x 21 + 1 x 20 = 45
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 7
To distinguish between different radix numbers, the digits will be enclosed in
parentheses and the radix of the number inserted as a subscript. For
example, to show the equality between decimal and binary forty-five we will
write (101101)2 = (45)10.
Besides the decimal and binary number systems, the octal (radix 8) and
hexadecimal (radix 16) are important in digital computer work. The eight
symbols of the octal system are 0, 1, 2, 3, 4, 5, 6, and 7. The 16 symbols of
the hexadecimal system are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.
The last six symbols are, unfortunately, identical to the letters of the
alphabet and can cause confusion at times. However, this is the convention
that has been adopted. When used to represent hexadecimal digits, the
symbols A, B, C, D, E, F correspond to the decimal numbers 10, 11, 12, 13,
14, 15, respectively.
A number in radix r can be converted into the familiar decimal system by
forming the sum of the weighted digits. For example, octal 736.4 is
converted to decimal as follows:
(736.4)8 = 7 x 82 + 3 x 81 + 6 x 80 + 4 x 8-1
= 7 x 64 + 3 x 8 + 6 x 1 + 4 / 8 = (478.5)10
The equivalent decimal number of hexadecimal F3 is obtained from the
following calculation:
(F3)16 = F x 16 + 3 = 15 x 16 + 3 = (243)10
Conversion from decimal to its equivalent representation in the radix r
system is carried out by separating the number into its integer and fraction
parts and converting each part separately. The conversion of a decimal
integer into a base r representation is done by successive divisions by r and
accumulation of the remainders. The conversion of a decimal fraction to
radix r representation is accomplished by successive multiplications by r
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 8
and accumulation of the integer digits so obtained. Fig.1.2 demonstrates
these procedures.
Integer = 41 Fraction = 0.6875
41 0.6875
20 1 x 2
10 0 1.3750
5 0 x 2
2 1 0.7500
1 0 x 2
0 1 1.5000
X 2
1.0000
(41)10 = (101001)2 (0.6875)10 = (0.1011)2
(41.6875)10 = (101001.1011)2
Fig. 1.2: Conversion of decimal 41.6875 into binary
The conversion of decimal 41.6875 into binary is done by first separating the
number into its integer part 41 and fraction part 0.6875. The integer part is
converted by dividing 41 by r = 2 to give an integer quotient 20 and a
remainder of 1. The quotient is again divided by 2 to give a new quotient
and remainder. This process is repeated until the integer quotient becomes
0. The coefficients of the binary number are obtained from the remainders
with the first remainder giving the low-order bit of the converted binary
number.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 9
The fraction part is converted by multiplying it by r = 2 to give an integer and
a fraction. The new fraction (without the integer) is multiplied again by 2 to
give a new integer and a new fraction. This process is repeated until the
fraction part becomes zero or until the number of digits obtained gives the
required accuracy. The coefficients of the binary fraction are obtained from
the integer digits with the first integer computed being the digit to be placed
next to the binary point. Finally, the two parts are combined to give the total
required conversion.
Octal and Hexadecimal Numbers
The conversion from and to binary, octal, and hexadecimal representation
plays an important part in digital computers. Since 23 = 8 and 24 = 16, each
octal digit corresponds to three binary digits and each hexadecimal digit
corresponds to four binary digits. The conversion from binary to octal is
easily accomplished by partitioning the binary number into groups of three
bits each starting from the least significant bit (LSB) position. The
corresponding octal digit is then assigned to each group of bits and the
string of digits so obtained gives the octal equivalent of the binary number.
Consider, for example, a 16-bit register. Physically, one may think of the
register as composed of 16 binary storage cells, with each cell capable of
holding either a 1 or a 0. Suppose that the bit configuration stored in the
register is as shown in Fig.1.3. Since a binary number consists of a string of
1’s and 0’s, the 16-bit register can be used to store any binary number from
0 to 216 – 1. For the particular example shown, the binary number stored in
the register is the equivalent of decimal 44899. Starting from the low-order
bit, we partition the register into groups of three bits each (the sixteenth bit
remains in a group by itself). Each group of three bits is assigned its octal
equivalent and placed on the top of the register. The string of octal digits so
obtained represents the octal equivalent of the binary number.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 10
Octal 1 2 7 5 4 3
Binary 1 0 1 0 1 1 1 1 0 1 1 0 0 0 1 1
Hexadecimal A F 6 3
Fig. 1.3: Binary, octal and hexadecimal conversion.
Octal number
Binary-coded octal
Decimal equivalent
0 000 0
1 001 1
2 010 2
3 011 3
4 100 4
5 101 5
6 110 6
7 111 7
10 001 000 8
11 001 001 9
12 001 010 10
24 010 100 20
62 110 010 50
143 001 100 011 99
370 011 111 000 248
Table 1.1: Binary-Coded Octal Numbers
Code for one octal digit
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 11
Hexadecimal number
Binary-coded hexadecimal
Decimal equivalent
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
A 1010 10
B 1011 11
C 1100 12
D 1101 13
E 1110 14
F 1111 15
14 0001 0100 20
32 0011 0010 50
63 0110 0011 99
F8 1111 1000 248
Table 1.2: Binary-Coded Hexadecimal Numbers
Code for
one
hexadecimal
digit
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 12
Conversion from binary to hexadecimal is similar except that the bits are
divided into groups of four. The corresponding hexadecimal digit for each
group of four bits is written as shown below the register of Fig.1.3. The
string of hexadecimal digits so obtained represents the hexadecimal
equivalent of the binary number. The corresponding octal digit for each
group of three bits is easily remembered after studying the first eight entries
listed in Table 1.1. The correspondence between a hexadecimal digit and its
equivalent 4-bit code can be found in the first 16 entries of Table 1.2.
Table 1.1 lists a few octal numbers and their representation in registers in
binary-coded form. The binary code is obtained by the procedure explained
above. Each octal digit is assigned a 3-bit code as specified by the entries
of the first eight digits in the table. Similarly, Table 1.2 lists a few
hexadecimal numbers and their representation in registers in binary-coded
form. Here the binary code is obtained by assigning to each hexadecimal
digit the 4-bit code listed in the first 16 entries of the table.
Comparing the binary-coded octal and hexadecimal numbers with their
binary number equivalent we find that the bit combination in all three
representations is exactly the same. For example, decimal 99, when
converted to binary, becomes 1100011. The binary-coded octal equivalent
of decimal 99 is 143 (001 100 011) and the binary-coded hexadecimal of
decimal 99 is 63 (0110 0011). If we neglect the leading zeros in these two
binary representations, we find that their bit combination is identical. This
should be so because of the straight forward conversion that exists between
binary numbers and octal or hexadecimal. The point of all this is that a string
of 1s and 0s stored in a register could represent a binary number, but this
same string of bits may be interpreted as holding an octal number in binary-
coded form (if we divide the bits into groups of three) or as holding a
hexadecimal number in binary-coded form (if we divide the bits into groups
of four).
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 13
The registers in a digital computer contain many bits. Specifying the content
of registers by their binary values will require a long string of binary digits. It
is more convenient to specify content of registers by their octal or
hexadecimal equivalent. The number of digits is reduced by one-third in the
octal designation and by one-fourth in the hexadecimal designation. For
example, the binary number 1111 1111 1111 has 12 digits. It can be
expressed in octal as 7777 (four digits) or in hexadecimal as FFF (three
digits). Computer manuals invariably choose either the octal or the
hexadecimal designation for specifying contents of registers.
Decimal Representation
The binary number system is the most natural system for a computer, but
people are accustomed to the decimal system. One way to solve this
conflict is to convert all input decimal numbers into binary numbers, let the
computer perform all arithmetic operations in binary and then convert the
binary results back to decimal for the human user to understand. However,
it is also possible for the computer to perform arithmetic operations directly
with decimal numbers provided they are placed in registers in a coded form.
Decimal numbers enter the computer usually as binary-coded alphanumeric
characters. These codes, introduced later, may contain from six to eight bits
for each decimal digit. When decimal numbers are used for internal
arithmetic computations, they are converted into a binary code with four bits
per digit.
A binary code is a group of n bits that assume up to 2n distinct combination
of 1s and 0s with each combination representing one element of the set that
is being coded. For example, a set of four elements can be coded by a 2-bit
code with each element assigned one of the following bit combinations; 00,
01, 10, or 11. A set of eight elements requires a 3-bit code; a set of 16
elements requires a 4-bit code, and so on. A binary code will have some
unassigned bit combinations if the number of elements in the set is not a
multiple power of 2. The 10 decimal digits from 0 to 9 form such a set. A
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 14
binary code that distinguishes among 10 elements must contain at least four
bits, but six combinations will remain unassigned. Numerous different
codes can be obtained by arranging four bits in 10 distinct combinations.
The bit assignment most commonly used for the decimal digits is the
straight binary assignment listed in the first 10 entries of Table 1.3. This
particular code is called Binary Coded Decimal (BCD). Other decimal
codes are sometimes used and a few of them are given in the section 1.7.
It is very important to understand the difference between the conversion of
decimal numbers into binary and the binary coding of decimal numbers. For
example, when converted to a binary number, the decimal number 99 is
represented by the string of bits 1100011, but when represented in BCD, it
becomes 1001 1001. The only difference between a decimal number
represented by the familiar digit symbols 0, 1, 2, …, 9 and the BCD symbols
0001, 0010, …, 1001 is in the symbols used to represent the digits – the
number itself is exactly the same. A few decimal numbers and their
representation in BCD are listed in Table 1.3.
Decimal number Binary-coded decimal (BCD) number
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
10 0001 0000
20 0010 0000
50 0101 0000
99 1001 1001
248 0010 0100 1000
Table 1.3: Binary Coded Decimal (BCD) Numbers
Code for one decimal digit
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 15
Alphanumeric Representation
Many applications of digital computers require the handling of data that
consist not only of numbers, but also of the letters of the alphabet and
certain special characters. An alphanumeric character set is a set of
elements that includes the 10 decimal digits, the 26 letters of the alphabet
and a number of special characters, such as $, +, and =. Such a set
contains between 32 and 64 elements (if only uppercase letters are
included) or between 64 and 128 (if both uppercase and lowercase letters
are included). In the first case, the binary code will require six bits and in
the second case, seven bits. The standard alphanumeric binary code is the
American Standard Code for Information Interchange (ASCII), which
uses seven bits to code 128 characters. The binary code for the uppercase
letters, the decimal digits, and a few special characters is listed in Table 1.4.
Note that the decimal digits in ASCII can be converted into BCD by
removing the three high-order bits, 011.
Binary codes play an important part in digital computer operations. The
codes must be in binary because registers can only hold binary information.
One must realize that binary codes merely change the symbols, not the
meaning of the discrete elements they represent. The operations specified
for digital computers must take into consideration the meaning of the bits
stored in registers so that operations are performed on operands of the
same type. In inspecting the bits of a computer register at random, one is
likely to find that it represents some type of coded information rather than a
binary number.
Binary codes can be formulated for any set of discrete elements such as the
musical notes and chess pieces and their positions on the chessboard.
Binary codes are also used to formulate instructions that specify control
information for the computer.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 16
Character Binary code Character Binary code
A 100 0001 0 011 0000
B 100 0010 1 011 0001
C 100 0011 2 011 0010
D 100 0100 3 011 0011
E 100 0101 4 011 0100
F 100 0110 5 011 0101
G 100 0111 6 011 0110
H 100 1000 7 011 0111
I 100 1001 8 011 1000
J 100 1010 9 011 1001
K 100 1011
L 100 1100
M 100 1101 Space 010 0000
N 100 1110 . 010 1110
O 100 1111 ( 010 1000
P 101 0000 + 010 1011
Q 101 0001 $ 010 0100
R 101 0010 * 010 0100
S 101 0011 ) 010 1001
T 101 0100 - 010 1101
U 101 0101 / 010 1111
V 101 0110 , 010 1100
W 101 0111 = 011 1101
X 101 1000
Y 101 1001
Z 101 1010
Table 1.4: American Standard Code for Information Interchange (ASCII)
1.4 Complements
Complements are used in digital computers for simplifying the subtraction
operation and for logical manipulation. There are two types of complements
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 17
for each base r system: the r’s complement and the (r-1)’s complement.
When the value of the base r is substituted in the name, the two types are
referred to as the 2’s and 1’s complement for binary numbers and the 10’s
and 9’s complement for decimal numbers.
(r-1)’s Complement
Given a number N in base r having n digits, the (r-1)’s complement of N is
defined as (rn – 1) – N. For decimal numbers r = 10 and r – 1 = 9, so the
9’s complement of N is (10n – 1) – N. Now, 10n represents a number that
consists of a single 1 followed by n 0s. 10n – 1 is a number represented by
n 9s. For example, with n = 4 we have 104 = 10000 and 104 – 1 = 9999. It
follows that the 9’s complement of a decimal number is obtained by
subtracting each digit from 9. For example, the 9’s complement of 546700
is 999999 – 546700 = 453299 and the 9’s complement of 12389 is 99999 –
12389 = 87610.
For binary numbers, r = 2 and r – 1 = 1, so the 1’s complement of N is
(2n – 1) – N. Again, 2n is represented by a binary number that consists of a
1 followed by n 0s. 2n – 1 is a binary number represented by n 1s. For
example, with n = 4, we have 24 = (10000)2 and 24 – 1 = (1111)2. Thus the
1’s complement of a binary number is obtained by subtracting each digit
from 1. However, the subtraction of a binary digit from 1 causes the bit to
change from 0 to 1 or from 1 to 0. Therefore, the 1’s complement of a
binary number is formed by changing 1s into 0s and 0s into 1s. For
example, the 1’s complement of 1011001 is 0100110 and the 1’s
complement of 0001111 is 1110000.
The (r – 1)’s complement of octal or hexadecimal numbers are obtained by
subtracting each digit from 7 or F (decimal 15) respectively.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 18
(r’s) Complement
The r’s complement of an n digit number N in base r is defined as rn – N for
N ≠ 0 and 0 for N = 0. Comparing with the (r – 1)’s complement, we note
that the r’s complement is obtained by adding 1 to the (r – 1)’s complement
since rn – N = [(rn – 1) – N] + 1. Thus the 10’s complement of the decimal
2389 is 7610 + 1 = 7611 and is obtained by adding 1 to the 9’s complement
value. The 2’s complement of binary 101100 is 010011 + 1 = 010100 and is
obtained by adding 1 to the 1’s complement value.
Since 10n is a number represented by a 1 followed by n 0s, then 10n – N,
which is the 10’s complement of N, can be formed also by leaving all least
significant 0s unchanged, subtracting the first non-zero least significant digit
from 10, and then subtracting all higher significant digits from 9. The 10’s
complement of 246700 is 753300 and is obtained by leaving the two zeros
unchanged, subtracting 7 from 10, and subtracting the other three digits
from 9. Similarly, the 2’s complement can be formed by leaving all least
significant 0’s and the first 1 unchanged, and then replacing 1s by 0s and
0s by 1s in all other higher significant bits. The 2’s complement of 1101100
is 0010100 and is obtained by leaving the two low-order 0s and the first
1 unchanged, and then replacing 1s by 0s and 0s by 1s in the other four
most significant bits.
In the definitions above it was assumed that the numbers do not have a
radix point. If the original number N contains a radix point, it should be
removed temporarily to form the r’s or (r-1)’s complement. The radix point
is then restored to the complemented number in the same relative position.
It is also worth mentioning that the complement of the complement restores
the number to its original value. The r’s complement of N is rn – N. The
complement of the complement is rn – (rn – N) = N giving back the original
number.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 19
Subtraction of Unsigned Numbers
The direct method of subtraction taught in elementary schools uses the
borrow concept. In this method we borrow a 1 from a higher significant
position when the minuend digit is smaller than the corresponding
subtrahend digit. This seems to be easiest when people perform
subtraction with paper and pencil. When subtraction is implemented with
digital hardware, this method is found to be less efficient than the method
that uses complements.
The subtraction of two n digit unsigned numbers M – N ( N ≠ 0) in base r
can be done as follows:
1. Add the minuend M to the r’s complement of the subtrahend N.
This performs M + (rn – N) = M – N + rn.
2. If M ≥ N, the sum will produce an end carry rn which is discarded,
and what is left is the result M – N.
3. If M < N, the sum does not produce an end carry and is equal to rn –
(N – M), which is the r’s complement of (N – M). To obtain the
answer in a familiar form, take the r’s complement of the sum and
place a negative sign in front.
Consider, for example, the subtraction 72532 – 13250 = 59282. The 10’s
complement of 13250 is 86750. Therefore:
M = 72532
10’s complement of N = +86750
Sum = 159282
Discard end carry 105 = -100000
Answer = 59282
Now consider an example with M < N. The subtraction 13250 – 72532
produces negative 59282. Using the procedure with complements, we have
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 20
M = 13250
10’s complement of N = +27468
Sum = 40718
There is no end carry. Answer is negative 59282 = 10’s complement of
40718.
Since we are dealing with unsigned numbers, there is really no way to get
an unsigned result for the second example. When working with paper and
pencil, we recognize that the answer must be changed to a signed negative
number. When subtracting with complements, the negative answer is
recognized by the absence of the end carry and the complemented result.
Subtraction with complements is done with binary numbers in a similar
manner using the same procedure outlined above. Using the two binary
numbers X = 1010100 and Y = 1000011, we perform the subtraction X – Y
and Y – X using 2’s complements:
X = 1010100
2’s complement of Y = +0111101
Sum = 10010001
Discard end carry 27 = -10000000
Answer: X – Y = 0010001
Y = 1000011
2’s complement of X = +0101100
Sum = 1101111
There is no end carry. Answer is negative 0010001 = 2’s complement of
1101111.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 21
1.5 Fixed-Point Representation
Positive integers, including zero, can be represented as unsigned numbers.
However, to represent negative integers, we need a notation for negative
values. In ordinary arithmetic, a negative number is indicated by a minus
sign and a positive number by a plus sign. Because of hardware limitations,
computers must represent everything with 1s and 0s, including the sign of a
number. As a consequence, it is customary to represent the sign with a bit
placed in the leftmost position of the number. The convention is to make the
sign bit equal to 0 for positive and to 1 for negative.
In addition to the sign, a number may have a binary (or decimal) point.
The position of the binary point is needed to represent fractions, integers, or
mixed integer-fraction numbers. The representation of the binary point in a
register is complicated by the fact that it is characterized by a position in the
register. There are two ways of specifying the position of the binary point in
a register: by giving it a fixed position or by employing a floating-point
representation. The fixed-point method assumes that the binary point is
always fixed in one position. The two positions most widely used are (1) a
binary point in the extreme left of the register to make the stored number a
fraction, and (2) a binary point in the extreme right of the register to make
the stored number an integer. In either case, the binary point is not actually
present, but its presence is assumed from the fact that the number stored in
the register is treated as a fraction or as an integer. The floating-point
representation uses a second register to store a number that designates the
position of the decimal point in the first register. Floating-point
representation is discussed further in the next section.
Integer Representation
When an integer binary number is positive, the sign is represented by 0 and
the magnitude by a positive binary number. When the number is negative,
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 22
the sign is represented by 1 but the rest of the number may be represented
in one of three possible ways:
1. Signed-magnitude representation
2. Signed-1’s complement representation
3. Signed-2’s complement representation
The signed-magnitude representation of a negative number consists of the
magnitude and a negative sign. In the other two representations, the
negative number is represented in either the 1’s or 2’s complement of its
positive value. As an example, consider the signed number 14 stored in an
8-bit register. +14 is represented by a sign bit of 0 in the leftmost position
followed by the binary equivalent of 14: 00001110. Note that each of the
eight bits of the register must have a value and therefore 0’s must be
inserted in the most significant positions following the sign bit. Although
there is only one way to represent +14, there are three different ways
to represent -14 with eight bits.
In signed-magnitude representation 1 0001110
In signed-1’s complement representation 1 1110001
In signed-2’s complement representation 1 1110010
The signed-magnitude representation of -14 is obtained from +14 by
complementing only the sign bit. The signed-1’s complement representation
of -14 is obtained by complementing all the bits of +14, including the sign
bit. The signed-2’s complement representation is obtained by taking the 2’s
complement of the positive number, including its sign bit.
The signed-magnitude system is used in ordinary arithmetic but is awkward
when employed in computer arithmetic. Therefore, the signed-complement
is normally used. The 1’s complement imposes difficulties because it has
two representations of 0 (+0 and -0). It is seldom used for arithmetic
operations except in some older computers. The 1’s complement is useful
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 23
as a logical operation since the change of 1 to 0 or 0 to 1 is equivalent to a
logical complement operation. The following discussion of signed binary
arithmetic deals exclusively with the signed-2’s complement representation
of negative numbers.
Arithmetic Addition
The addition of two numbers in the signed-magnitude system follows the
rules of ordinary arithmetic. If the signs are the same, we add the two
magnitudes and give the sum the common sign. If the signs are different,
we subtract the smaller magnitude from the larger and give the result the
sign of the larger magnitude. For example, (+25) + (-37) = -(37 – 25) = -12
and is done by subtracting the smaller magnitude 25 from the larger
magnitude 37 and using the sign of 37 for the sign of the result. This is a
process that requires the comparison of the signs and the magnitudes and
then performing either addition or subtraction.
By contrast, the rule for adding numbers in the signed-2’s complement
system does not require a comparison or subtraction, only addition and
complementation. The procedure is very simple and can be stated as
follows: Add the two numbers, including their sign bits, and discard
any carry out of the sign (leftmost) bit position. Numerical examples for
addition are shown below. Note that negative numbers must initially be
in 2’s complement and that if the sum obtained after the addition is
negative, it is in 2’s complement form.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 24
+6 00000110 -6 11111010
+13 00001101 +13 00001101
----- ------------- ----- -------------
+19 00010011 +7 00000111
+6 00000110 -6 11111010
-13 11110011 -13 11110011
---- ------------- ----- -------------
-7 11111001 -19 11101101
In each of the four cases, the operation performed is always addition,
including the sign bits. Any carry out of the sign bit position is discarded,
and negative results are automatically in 2’s complement form.
The complement form of representing negative numbers is unfamiliar to
people used to the signed-magnitude system. To determine the value of a
negative number when in signed-2’s complement, it is necessary to convert
it to a positive number to place it in a more familiar form. For example, the
signed binary number 11111001 is negative because the leftmost bit is 1.
Its 2’s complement is 00000111, which is the binary equivalent of +7. We
therefore recognize the original negative number to be equal to -7.
Arithmetic Subtraction
Subtraction of two signed binary numbers when negative numbers are in 2’s
complement form is very simple and can be stated as follows: Take the 2’s
complement of the subtrahend (including the sign bit) and add it to the
minuend (including the sign bit). A carry out of the sign bit position is
discarded.
This procedure stems from the fact that a subtraction operation can be
changed to an addition operation if the sign of the subtrahend is changed.
This is demonstrated by the following relationship:
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 25
(±A) – (+B) = (±A) + (-B)
(±A) – (-B) = (±A) + (+B)
But changing a positive number to a negative number is easily done by
taking its 2’s complement. The reverse is also true because the complement
of a negative number in complement form produces the equivalent positive
number. Consider the subtraction of (-6) – (-13) = +7. In binary with eight
bits this is written as 11111010 – 11110011. The subtraction is changed to
addition by taking the 2’s complement of the subtrahend (-13) to give (+13).
In binary this is 11111010 + 00001101 = 100000111. Removing the end
carry, we obtain the correct answer 00000111 (+7).
It is worth noting that binary numbers in the signed-2’s complement system
are added and subtracted by the same basic addition and subtraction rules
as unsigned numbers. Therefore, computers need only one common
hardware circuit to handle both types of arithmetic. The user or programmer
must interpret the results of such addition or subtraction differently
depending on whether it is assumed that the numbers are signed or
unsigned.
Overflow
When two numbers of n digits each are added and the sum occupies n+1
digits, we say that an overflow has occurred. When the addition is
performed with paper and pencil, an overflow is not a problem since there is
no limit to the width of the page to write down the sum. An overflow is a
problem in digital computers because the width of registers is finite. A result
that contains n+1 bits cannot be accommodated in a register with a
standard length of n bits. For this reason, many computers detect the
occurrence of an overflow, and when it occurs, a corresponding flip-flop is
set which can then be checked by the user.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 26
The detection of an overflow after the addition of two binary numbers
depends on whether the numbers are considered to be signed or unsigned.
When two unsigned numbers are added, an overflow is detected from the
end carry out of the most significant position. In the case of singed numbers,
the leftmost bit always represents the sign, and negative numbers are in 2’s
complement form. When two signed numbers are added, the sign bit is
treated as part of the number and the end carry does not indicate an
overflow.
An overflow cannot occur after an addition if one number is positive and the
other is negative, since adding a positive number to a negative number
produces a result that is smaller than the larger of the two original numbers.
An overflow may occur if the two numbers added are both positive or both
negative. To see how this can happen, consider the following example. Two
signed binary numbers, +70 and +80, are stored in two 8-bit registers. The
range of numbers that each register can accommodate is from binary +127
to binary -128. Since the sum of the two numbers is +150, it exceeds the
capacity of the 8-bit register. This is true if the numbers are both positive or
both negative. The two additions in binary are shown below together with
the last two carries.
carries: 0 1 carries: 1 0
+70 0 1000110 -70 1 0111010
+80 0 1010000 -80 1 0110000
------- ---------------- ------- ----------------
+150 1 0010110 -150 0 1101010
Note that the 8-bit result that should have been positive has a negative sign
bit and the 8-bit result that should have been negative has a positive sign
bit. If, however, the carry out of the sign bit position is taken as the sign bit
of the result, the 9-bit answer so obtained will be correct. Since the answer
cannot be accommodated within 8 bits, we say that an overflow occurred.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 27
An overflow condition can be detected by observing the carry into the sign
bit position and the carry out of the sign bit position. If these two carries are
not equal, an overflow condition is produced. This is indicated in the
examples where the two carries are explicitly shown. If the two carries are
applied to an exclusive-OR gate, an overflow will be detected when the
output of the gate is equal to 1.
Decimal Fixed-Point Representation
The representation of decimal numbers in registers is a function of the
binary code used to represent a decimal digit. A 4-bit decimal code requires
four flip-flops for each decimal digit. The representation of 4385 in BCD
requires 16 flip-flops, four flip-flops for each digit. The number will be
represented in a register with 16 flip-flops as follows:
0100 0011 1000 0101
By representing numbers in decimal we are wasting a considerable amount
of storage space since the number of bits needed to store a decimal number
in a binary code is greater than the number of bits needed for its equivalent
binary representation. Also, the circuits required to perform decimal
arithmetic are more complex. However, there are some advantages in the
use of decimal representation because computer input and output data are
generated by people who use the decimal system. Some applications, such
as business data processing, require small amounts of arithmetic
computations compared to the amount required for input and output of
decimal data. For this reason, some computers and all electronic calculators
perform arithmetic operations directly with the decimal data (in a binary
code) and thus eliminate the need for conversion into binary and back to
decimal. Some computer systems have hardware for arithmetic calculations
with both binary and decimal data.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 28
The representation of signed decimal numbers in BCD is similar to the
representation of signed numbers in binary. We can either use the familiar
signed-magnitude system or the signed-complement system. The sign of a
decimal number is usually represented with four bits to conform to the
4-bit code of the decimal digits. It is customary to designate a plus with four
0’s and a minus with the BCD equivalent of 9, which is 1001.
The signed-magnitude system is difficult to use with computers. The
signed-complement system can be either the 9’s or the 10’s complement,
but the 10’s complement is the one most often used. To obtain the 10’s
complement of a BCD number, we first take the 9’s complement and then
add one to the least significant digit. The 9’s complement is calculated from
the subtraction of each digit from 9.
The procedures developed for the signed-2’s complement system apply also
to the signed-10’s complement system for decimal numbers. Addition is
done by adding all digits, including the sign digit, and discarding the end
carry. Obviously, this assumes that all negative numbers are in 10’s
complement form. Consider the addition (+375) + (-240) = +135 done in the
signed-10’s complement system.
0 375 (0000 0011 0111 0101)BCD
+9 760 (1001 0111 0110 0000)BCD
-------- -----------------------------------
0 135 (0000 0001 0011 0101)BCD
The 9 in the leftmost position of the second number indicates that the
number is negative. 9760 is the 10’s complement of 0240. The two numbers
are added and the end carry is discarded to obtain +135. Of course, the
decimal numbers inside the computer must be in BCD, including the sign
digits. The addition is done with BCD adders.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 29
The subtraction of decimal numbers either unsigned or in the signed-10’s
complement system is the same as in the binary case. Take the 10’s
complement of the subtrahend and add it to the minuend. Many computers
have special hardware to perform arithmetic calculations directly with
decimal numbers in BCD. The user of the computer can specify by
programmed instructions that the arithmetic operations be performed with
decimal numbers directly without having to convert them to binary.
1.6 Floating-Point Representation
The floating-point representation of a number has two parts. The first part
represents a signed, fixed-point number called the mantissa. The second
part designates the position of the decimal (or binary) point and is called the
exponent. The fixed-point mantissa may be a fraction or an integer. For
example, the decimal number +6132.789 is represented in floating-point
with a fraction and an exponent as follows:
Fraction Exponent
+0.6132789 +04
The value of the exponent indicates that the actual position of the decimal
point is four positions to the right of the indicated decimal point in the
fraction. This representation is equivalent to the scientific notation
+0.6132789 x 10+4.
Floating-point is always interpreted to represent a number in the following
form:
m x re
Only the mantissa m and the exponent e are physically represented in the
register (including their signs). The radix r and the radix-point position of the
mantissa are always assumed. The circuits that manipulate the floating-
point numbers in registers conform with these two assumptions in order to
provide the correct computational results.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 30
A floating-point binary number is represented in a similar manner except
that it uses base 2 for the exponent. For example, the binary number
+1001.11 is represented with an 8-bit fraction and 6-bit exponent as follows:
Fraction Exponent
01001110 000100
The fraction has a 0 in the leftmost position to denote positive. The binary
point of the fraction follows the sign bit but is not shown in the register. The
exponent has the equivalent binary number +4. The floating-point number is
equivalent to
m x 2e = +(.1001110)2 x 2+4
A floating-point number is said to be normalized if the most significant digit
of the mantissa is nonzero. For example, the decimal number 350 is
normalized but 00035 is not. Regardless of where the position of the radix
point is assumed to be in the mantissa, the number is normalized only if its
leftmost digit is nonzero. For example, the 8-bit binary number 00011010 is
not normalized because of the three leading 0s. The number can be
normalized by shifting it three positions to the left and discarding the leading
0s to obtain 11010000. The three shifts multiply the number by 23 = 8. To
keep the same value for the floating-point number, the exponent must be
subtracted by 3. Normalized numbers provide the maximum possible
precision for the floating-point number. A zero cannot be normalized
because it does not have a nonzero digit. It is usually represented in
floating-point by all 0s in the mantissa and exponent.
Arithmetic operations with floating-point numbers are more complicated than
arithmetic operations with fixed-point numbers and their execution takes
longer and requires more complex hardware. However, floating-point
representation is a must for scientific computations because of the scaling
problems involved with fixed-point computations. Many computers and all
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 31
electronic calculators have the built-in capability of performing floating-point
arithmetic operations. Computers that do not have hardware for floating-
point computations have a set of subroutines to help the user program
scientific problems with floating-point numbers.
1.7 Other Binary Codes
In previous sections we introduced the most common types of binary-coded
data found in digital computers. Other binary codes for decimal numbers
and alphanumeric characters are sometimes used. Digital computers also
employ other binary codes for special applications. A few additional binary
codes encountered in digital computers are presented in this section.
Gray Code
Digital systems can process data in discrete form only. Many physical
systems supply continuous output data. The data must be converted into
digital form before they can be used by a digital computer. Continuous, or
analog, information is converted into digital form by means of an analog-to-
digital converter. The reflected binary or Gray code, shown in Table 1.5, is
sometimes used for the converted digital data. The advantage of the Gray
code over straight binary numbers is that the Gray code changes by only
one bit as it sequences from one number to the next. In other words, the
change from any number to the next in sequence is recognized by a change
of only one bit from 0 to 1 or from 1 to 0. A typical application of the Gray
code occurs when the analog data are represented by the continuous
change of a shaft position. The shaft is partitioned into segments with each
segment assigned a number. If adjacent segments are made to correspond
to adjacent Gray code numbers, ambiguity is reduced when the shaft
position is in the line that separates any two segments.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 32
Binary code
Decimal equivalent
Binary
code
Decimal equivalent
0000 0 1100 8
0001 1 1101 9
0011 2 1111 10
0010 3 1110 11
0110 4 1010 12
0111 5 1011 13
0101 6 1001 14
0100 7 1000 15
Table 1.5: 4-Bit Gray Code
Gray codes counters are sometimes used to provide the timing sequence
that control the operations in a digital system. A Gray code counter is a
counter whose flip-flops go through a sequence of states as specified in
Table 1.5. Gray code counters remove the ambiguity during the change
from one state of the counter to the next because only one bit can change
during the state transition.
Other Decimal Codes
Binary codes for decimal digits require a minimum of four bits. Numerous
different codes can be formulated by arranging four or more bits in 10
distinct possible combinations. A few possibilities are shown in Table 1.6.
Decimal digit BCD 8421 2421 Excess-3 Excess-3 gray
0 0000 0000 0011 0010
1 0001 0001 0100 0110
2 0010 0010 0101 0111
3 0011 0011 0110 0101
4 0100 0100 0111 0100
5 0101 1011 1000 1100
6 0110 1100 1001 1101
7 0111 1101 1010 1111
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 33
8 1000 1110 1011 1110
9 1001 1111 1100 1010
Unused
bit
combinations
1010 0101 0000 0000
1011 0110 0001 0001
1100 0111 0010 0011
1101 1000 1101 1000
1110 1001 1110 1001
1111 1010 1111 1011
Table 1.6: Four Different Binary Codes for the Decimal Digit
The BCD (binary-coded decimal) has been introduced before. It uses a
straight assignment of the binary equivalent of the digit. The six unused bit
combinations listed have no meaning when BCD is used, just as the letter H
has no meaning when decimal digit symbols are written down. For example,
saying that 1001 110 is a decimal number in BCD is like saying that 9H is a
decimal number in the conventional symbol designation. Both cases contain
an invalid symbol and therefore designate a meaningless number.
One disadvantage of using BCD is the difficulty encountered when the 9’s
complement of the number is to be computed. On the other hand, the 9’s
complement is easily obtained with the 2421 and the excess-3 codes listed
in Table 1.6. These two codes have a self-complementing property which
means that the 9’s complement of a decimal number, when represented in
one of these codes, is easily obtained by changing 1’s to 0’s and 0’s to 1’s.
The property is useful when arithmetic operations are done in signed-
complement representation.
The 2421 is an example of a weighted code. In a weighted code, the bits
are multiplied by the weights indicated and the sum of the weighted bits
gives the decimal digit. For example, the bit combination 1101,
when weighted by the respective digits 2421, gives the decimal equivalent
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 34
of 2 x 1 + 4 x 1 + 2 x 0 + 1 + 1 = 7. The BCD code can be assigned the
weights 8421 and for this reason it is sometimes called the 8421 code.
The excess-3 code is a decimal code that has been used in older
computers. This is an unweighted code. Its binary code assignment is
obtained from the corresponding BCD equivalent binary number after the
addition of binary 3 (0011).
From Table 1.5 we note that the Gray code is not suited for a decimal code
if we were to choose the first 10 entries in the table. This is because the
transition from 9 back to 0 involves a change of three bits (from 1101 to
0000). To overcome this difficulty, we choose the 10 numbers starting from
the third entry 0010 up to the twelfth entry 1010. Now the transition from
1010 to 0010 involves a change of only one bit. Since the code has been
shifted up three numbers, it is called the excess-3 Gray. This code is listed
with the other decimal codes in Table 1.6.
Other Alphanumeric Codes
The ASCII code (Table 1.4) is the standard code commonly used for the
transmission of binary information. Each character is represented by a 7-bit
code and usually an eighth bit is inserted for parity. The code consists of
128 characters. Ninety-five characters represent graphic symbols that
include upper- and lowercase letters, numerals zero to nine, punctuation
marks, and special symbols. Twenty-three characters represent format
effectors, which are functional characters for controlling the layout of printing
or display devices such as carriage return, line feed, horizontal tabulation,
and back space. The other 10 characters are used to direct the data
communication flow and report its status.
Another alphanumeric (sometimes called alphameric) code used in IBM
equipment is the EBCDIC (Extended BCD Interchange Code). It uses
eight bits for each character (and a ninth bit for parity). EBCDIC has the
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 35
same character symbols as ASCII but the bit assignment to characters is
different.
When alphanumeric characters are used internally in a computer for data
processing (not for transmission purpose) it is more convenient to use a 6-
bit code to represent 64-characters. A 6-bit code can specify the 26
uppercase letters of the alphabet, numerals zero to nine, and up to 28
special characters. This set of characters is usually sufficient for data-
processing purposes. Using fewer bits to code characters has the
advantage of reducing the memory space needed to store large quantities of
alphanumeric data.
1.8 Summary
This unit explains clearly various units of a digital computer and different
data formats used in computers. Also we have learnt the number
representation in digital computers and different types of codes used in
computers and communication.
Self Assessment Questions
1. The word digital implies that the information in the computer is
represented by variables that take a limited number of ____________
values.
2. The _________ of the computer consists of all the electronic
components and electromechanical devices that comprise the physical
entity of the device.
3. The ______________ of a computer consists of a collection of
programs whose purpose is to make more effective use of the
computer.
Computer Organization and Architecture Unit 1
Sikkim Manipal University - DDE Page No. 36
4. The ____________ contains electronic circuits for communicating and
controlling the transfer of information between the computer and the
outside world.
5. ASCII stands for _________________________.
1.9 Terminal Questions
1. Convert the following binary numbers to decimal: 101110; 1110101; and
110110100.
2. Convert the following decimal numbers to binary: 1231; 673; and 1998.
3. Convert the hexadecimal number F3A7C2 to binary and octal.
4. Explain different types of binary codes.
1.10 Answers
Self Assessment Questions:
1. discrete
2. hardware
3. system software
4. input-output processor (IOP)
5. American Standard Code for Information Interchange
Terminal Questions:
1. Refer Section 1.3
2. Refer Section 1.3
3. Refer Section 1.3
4. Refer Section 1.7
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 37
Unit 2 Register Transfer and Microoperations
Structure:
2.1 Introduction
Objectives
2.2 Register Transfer Language
2.3 Register Transfer
2.4 Bus and Memory Transfers
2.5 Arithmetic Microoperations
2.6 Logic Microoperations
2.7 Shift Microoperations
2.8 Arithmetic Logic Shift Unit
2.9 Summary
2.10 Terminal Questions
2.11 Answers
2.1 Introduction
A digital system is an interconnection of digital hardware modules that
accomplish a specific information-processing task. Digital systems vary in
size and complexity, from a few integrated circuits to a complex of
interconnected and interacting digital computers. Digital system design
invariably uses a modular approach. The modules are constructed from
such digital components as registers, decoders, arithmetic elements, and
control logic. The various modules are interconnected with common data
and control paths to form a digital computer system.
Objectives:
After studying this unit, the learner will be able to
Explain Register Transfer Language
Understand bus selection techniques
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 38
Explain arithmetic and logical circuits
Understand the operation of arithmetic logic shift unit
2.2 Register Transfer Language
Digital modules are best defined by the registers they contain and the
operations that are performed on the data stored in them. The operations
executed on data stored in registers are called microoperations. A
microoperation is an elementary operation performed on the information
stored in one or more registers. The result of the operation may replace the
previous binary information of a register or may be transferred to another
register. Examples of microoperations are shift, count, clear, and load.
The internal hardware organization of a digital computer is best defined by
specifying:
1. The set of registers it contains and their function.
2. The sequence of microoperations performed on the binary
information stored in the registers.
3. The control that initiates the sequence of microoperations.
It is possible to specify the sequence of microoperations in a computer by
explaining every operation in words, but this procedure usually involves a
lengthy descriptive explanation. It is more convenient to adopt a suitable
symbology to describe the sequence of transfers between registers and the
various arithmetic and logic microoperations associated with the transform.
The use of symbols instead of a narrative explanation provides an organized
and concise manner for listing the microoperation sequences in registers
and the control functions that initiate them.
The symbolic notation used to describe the microoperation transfers among
registers is called a register transfer language. The term “register transfer”
implies the availability of hardware logic circuits that can perform a stated
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 39
microoperation and transfer the result of the operation to the same or
another register. The word “language” is borrowed from programmers, who
apply this term to programming languages. A programming language is a
procedure for writing symbols to specify a given computational process.
Similarly, a natural language such as English is a system for writing symbols
and combining them into words and sentences for the purpose of
communication between people. A register transfer language is a system for
expressing in symbolic form the microoperation sequences among the
registers of a digital module. It is a convenient tool for describing the internal
organization of digital computers in concise and precise manner. It can also
be used to facilitate the design process of digital systems.
The register transfer language adopted here is believed to be as simple as
possible, so it should not take very long to memorize. We will proceed to
define symbols for various types of microoperations, and at the same time,
describe associated hardware that can implement the stated
microoperations. The symbolic designation introduced in this chapter will be
utilized in subsequent chapters to specify the register transfers, the
microoperations, and the control functions that describe the internal
hardware organization of digital computers.
2.3 Register Transfer
Computer registers are designated by capital letters (sometimes followed by
numerals) to denote the function of the register. For example, the register
that holds an address for the memory unit is usually called a memory
address register and is designated by the name MAR. Other designations
for registers are PC (for program counter), IR (for instruction register),
and R1 (for processor register). The individual flip-flops in an n-bit register
are numbers in sequence from 0 through n-1, starting from 0 in the
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 40
rightmost position and increasing the numbers toward the left. Fig.2.1
shows the representation of registers in block diagram form. The most
common way to represent a register is by a rectangular box with the name
of the register inside, as in Fig.2.1 (a). The individual bits can be
distinguished as in (b). The numbering of bits in a 16-bit register can be
marked on top of the box as shown in (c). A 16-bit register is partitioned into
two parts in (d). Bits 0 through 7 are assigned the symbol L (for low byte)
and bits 8 through 15 are assigned the symbol H (for high byte). The name
of the 16-bit register is PC. The symbol PC (0-7) or PC (L) refers to the
low-order byte and PC (8-15) or PC (H) to the high-order byte.
Fig. 2.1: Block diagram of register
Information transfer from one register to another is designated in symbolic
form by means of a replacement operator. The statement
R2 R1
denotes a transfer of the content of register R1 into register R2. It
designates a replacement of the content of R2 by the content of R1. By
definition, the content of the source register R1 does not change after the
transfer.
R1 7 6 5 4 3 2 1 0
R2 PC (H) PC (L)
a) Register R b) Showing individual bits
c) Numbering of bits d) Divided into two parts
15 0 15 8 7 0
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 41
A statement that specifies a register transfer implies that circuits are
available from the outputs of the source register to the inputs of the
destination register and that the designation register has a parallel load
capability. Normally, we want the transfer to occur only under a
predetermined control condition. This can be shown by means of an if-then
statement.
If (P = 1) then (R2 R1)
where P is a control signal generated in the control section. It is sometimes
convenient to separate the control variables from the register transfer
operation by specifying a control function. A control function is a Boolean
variable that is equal to 1 or 0. The control function is included in the
statement as follows:
P: R2 R1
The control condition is terminated with a colon. It symbolizes the
requirement that the transfer operation be executed by the hardware only if
P = 1.
Every statement written in a register transfer notation implies a hardware
construction for implementing the transfer. Fig.2.2 shows the block diagram
that depicts the transfer from R1 to R2. The n outputs of register R1 are
connected to the n inputs of register R2. The letter n will be used to indicate
any number of bits for the register. It will be replaced by an actual number
when the length of the register is known. Register R2 has a load input that
is activated by the control variable P. It is assumed that the control variable
is synchronized with the same clock as the one applied to the register. As
shown in the timing diagram (b), P is activated in the control section by the
rising edge of a clock pulse at time t. The next positive transition of the
clock at time t + 1 finds the load input active and the data inputs of R2 are
then loaded into the register in parallel. P may go back to 0 at time t + 1;
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 42
otherwise, the transfer will occur with every clock pulse transition while P
remains active.
Fig. 2.2: Transfer from R1 to R2 when P = 1
Note that the clock is not included as a variable in the register transfer
statements. It is assumed that all transfers occur during a clock edge
transition. Even though the control condition such as P becomes active just
after time t, the actual transfer does not occur until the register is triggered
by the next positive transition of the clock at time t + 1.
The basic symbols of the register transfer notation are listed in Table 2.1.
Registers are denoted by capital letters, and numerals may follow the
letters. Parentheses are used to denote a part of a register by specifying the
range of bits or by giving a symbol name to a portion of a register. The
arrow denotes a transfer of information and the direction of transfer. A
comma is used to separate two or more operations that are executed at the
same time. The statement
Control circuit R2
R1
P Load Clock
n
a) Block diagram
b) Timing diagram
Clock
Load
Transfer occurs here
r r + 1
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 43
T: R2 R1, R1 R2
denotes an operation that exchanges the contents of two registers during
one common clock pulse provided that T = 1. This simultaneous operation is
possible with registers that have edge-triggered flip-flops.
Symbol Description Examples
Letters
(and numerals)
Denotes a register MAR, R2
Parentheses ( ) Denotes a part of a register R2(0-7), R2(L)
Arrow Denotes transfer of information R2 R1
Comma, Separates two microoperations R2 R1, R1 R2
Table 2.1: Basic Symbols for Register Transfers
2.4 Bus and Memory Transfers
A typical digital computer has many registers, and paths must be provided
to transfer information from one register to another. The number of wires will
be excessive if separate lines are used between each register and all other
registers in the system. A more efficient scheme for transferring information
between registers in a multiple-register configuration is a common bus
system. A bus structure consists of a set of common lines, one for each bit
of a register, through which binary information is transferred one at a time.
Control signals determine which register is selected by the bus during each
particular register transfer.
One way of constructing a common bus system is with multiplexers. The
multiplexers select the source register whose binary information is then
placed on the bus. The construction of a bus system for four registers is
shown in Fig.2.3. Each register has four bits, numbered 0 through 3. The
bus consists of four 4 x 1 multiplexers each having four data inputs, 0
through 3, and two selection inputs, S1 and S0. In order not to complicate
the diagram with 16 lines crossing each other, we use labels to show the
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 44
connections from the outputs of the registers to the inputs of the
multiplexers. For example, output 1 of register A is connected to input 0 of
MUX 1 because this input is labeled A1. The diagram shows that the bits in
the same significant position in each register are connected to the data
inputs of one multiplexer to form one line of the bus. Thus MUX 0
multiplexes the four 0 bits of the registers, MUX 1 multiplexes the four 1 bits
of the registers, and similarly for the other two bits.
Fig. 2.3: Bus system for four registers
The two selection lines S1 and S0 are connected to the selection inputs of
all four multiplexers. The selection lines choose the four bits of one register
and transfer them into the four-line common bus. When S1S0 = 00, the 0
data inputs of all four multiplexers are selected and applied to the outputs
that form the bus. This causes the bus lines to receive the content of
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 45
register A since the outputs of this register are connected to the 0 data
inputs of the multiplexers. Similarly, register B is selected if S1S0 = 01, and
so on. Table 2.2 shows the register that is selected by the bus for each of
the four possible binary values of the selection lines.
1 S0 Register selected
0 0 A
0 1 B
1 0 C
1 1 D
Table 2.2: Function Table for Bus of Fig. 2.3
In general, a bus system will multiplex k registers of n bits each to produce
an n-line common bus. The number of multiplexers needed to construct the
bus is equal to n, the number of bits in each register. The size of each
multiplexer must be k x 1 since it multiplexes k data lines. For example, a
common bus for eight registers of 16 bits each requires 16 multiplexers, one
for each line in the bus. Each multiplexer must have eight data input lines
and three selection lines to multiplex one significant bit in the eight registers.
The transfer of information from a bus into one of many destination registers
can be accomplished by connecting the bus lines to the inputs of all
destination registers and activating the load control of the particular
destination register selected. The symbolic statement for a bus transfer may
mention the bus or its presence may be implied in the statement. When the
bus is included in the statement, the register transfer is symbolized as
follows:
BUS C, R1 BUS
The content of register C is placed on the bus, and the content of the bus is
loaded into register R1 by activating its load control input. If the bus is
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 46
known to exist in the system, it may be convenient just to show the direct
transfer.
R1 C
From this statement the designer knows which control signals must be
activated to produce the transfer through the bus.
Three-State Bus Buffers
A bus system can be constructed with three-state gates instead of
multiplexers. A three-state gate is a digital circuit that exhibits three states.
Two of the states are signals equivalent to logic 1 and 0 as in a conventional
gate. The third state is a high-impedance state. The high-impedance state
behaves like an open circuit, which means that the output is disconnected
and does not have a logical significance. Three-state gates may perform
any conventional logic, such as AND or NAND. However, the one most
commonly used in the design of a bus system is the buffer gate.
The graphic symbol of a three-state buffer gate is shown in Fig.2.4. It is
distinguished from a normal buffer by having both a normal input and a
control input. The control input determines the output state. When the
control input is equal to 1, the output is enabled and the gate behaves like
any conventional buffer, with the output equal to the normal input. When the
control input is 0, the output is disabled and the gate goes to a high-
impedance state, regardless of the value in the normal input. The high-
impedance state of a three-state gate provides a special feature not
available in other gates. Because of this feature, a large number of three-
state gate outputs can be connected with wires to form a common bus line
without endangering loading effects.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 47
Fig. 2.4: Graphic symbols for three-state buffer
The construction of a bus system with three-state buffers is demonstrated
in Fig.2.5. The outputs of four buffers are connected together to form a
single bus line. (It must be realized that this type of connection cannot be
done with gates that do not have three-state outputs). The control inputs to
the buffers determine which of the four normal inputs will communicate with
the bus line. No more than one buffer may be in the active state at any given
time. The connected buffers must be controlled so that only one three-state
buffer has access to the bus line while all other buffers are maintained in a
high-impedance state.
Fig. 2.5: Bus line with three state-buffers
One way to ensure that no more than one control input is active at any given
time is to use a decoder, as shown in the diagram. When the enable input
Normal input A
Control input C
Output Y=A if C=1 High-impedance if C=0
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 48
of the decoder is 0, all of its four outputs are 0, and the bus line is in a high-
impedance state because all four buffers are disabled. When the enable
input is active, one of the three-state buffers will be active, depending on the
binary value in the select inputs of the decoder. Careful investigation will
reveal that Fig.2.5 is another way of constructing a 4 x 1 multiplexer since
the circuit can replace the multiplexer in Fig.2.3.
To construct a common bus for four registers of n bits each using three-
state buffers, we need n circuits with four buffers in each as shown in
Fig.2.5. Each group of four buffers receives one significant bit from the four
registers. Each common output produces one of the lines for the common
bus for a total of n lines. Only one decoder is necessary to select between
the four registers.
Memory Transfer
The transfer of information from a memory word to the outside environment
is called a read operation. The transfer of new information to be stored into
the memory is called a write operation. A memory word will be symbolized
by the letter M. The particular memory word among the many available is
selected by the memory address during the transfer. It is necessary to
specify the address of M when writing memory transfer operations. This will
be done by enclosing the address in square brackets following the letter M.
Consider a memory unit that receives the address from a register, called the
address register, symbolized by AR. The data are transferred to another
register, called the data register, symbolized by DR. The read operation can
be stated as follows:
Read: DR M[AR]
This causes a transfer of information into DR from the memory word M
selected by the address in AR.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 49
The write operation transfers the content of a data register to a memory
word M selected by the address. Assume that the input data are in register
R1 and the address is in AR. The write operation can be stated symbolically
as follows:
Write: M[AR] R1
This causes a transfer of information from R1 into the memory word M
selected by the address in AR.
2.5 Arithmetic Microoperations
A microoperation is an elementary operation performed with the data
stored in registers. The microoperations most often encountered in digital
computers are classified into four categories:
1. Register transfer microoperations transfer binary information from
one register to another.
2. Arithmetic microoperations perform arithmetic operations on numeric
data stored in registers.
3. Logic microoperations perform bit manipulation operations on non-
numeric data stored in registers.
4. Shift microoperations perform shift operations on data stored in
registers.
The register transfer microoperation was introduced in Sec.2.3. This type of
microoperation does not change the information content when the binary
information moves from the source register to the destination register. The
other three types of microoperations change the information content during
the transfer. In this section we introduce a set of arithmetic microoperations.
In the next two sections we present the logic and shift microoperations.
The basic arithmetic microoperations are addition, subtraction, increment,
decrement, and shift. Arithmetic shifts are explained later in conjunction with
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 50
the shift microoperations. The arithmetic microoperation defined by the
statement
R3 R1 + R2
specifies an add microoperation. It states that the contents of register R1
are added to the contents of register R2 and the sum transferred to register
R3. To implement this statement with hardware we need three registers and
the digital component that performs the addition operation. The other basic
arithmetic microoperations are listed in Table 2.3. Subtraction is most often
implemented through complementation and addition. Instead of using the
minus operator, we can specify the subtraction by the following statement:
R3 R1 + R2 + 1
2R is the symbol for the 1’s complement of R2. Adding 1 to the 1’s
complement produces the 2’s complement. Adding the contents of R1 to
the 2’s complement of R2 is equivalent to R1 – R2.
Symbolic designation Description
R3 R1 + R2 Contents of R1 plus R2 transferred to R3
R3 R1 – R2 Contents of R1 minus R2 transferred to R3
R2 2R Complement the contents of R2 (1’s complement)
R2 2R + 1 2’s complement the contents of R2 (negate)
R3 R1 + 2R + 1 R1 plus the 2’s complement of R2 (subtraction)
R1 R1 + 1 Increment the contents of R1 by one
R1 R1 – 1 Decrement the contents of R1 by one
Table 2.3: Arithmetic Microoperations
The increment and decrement microoperations are symbolized by plus-one
and minus-one operations, respectively. These microoperations are
implemented with a combinational circuit or with a binary up-down counter.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 51
The arithmetic operations of multiply and divide are not listed in Table 2.3.
These two operations are valid arithmetic operations but are not included in
the basic set of microoperations. The only place where these operations can
be considered as microoperations is in a digital system, where they are
implemented by means of a combinational circuit. In such a case, the
signals that perform these operations propagate through gates, and the
result of the operation can be transferred into a destination register by a
clock pulse as soon as the output signal propagates through the
combinational circuit. In most computers, the multiplication operation is
implemented with a sequence of add and shift microoperations. Division is
implemented with a sequence of subtract and shift microoperations. To
specify the hardware in such a case requires a list of statements that use
the basic microoperations of add, subtract and shift.
Binary Adder
To implement the add microoperation with hardware, we need the registers
that hold the data and the digital component that performs the arithmetic
addition. The digital circuit that forms the arithmetic sum of two bits and a
previous carry is called a full-adder. The digital circuit that generates the
arithmetic sum of two binary numbers of any lengths is called a binary
adder. The binary adder is constructed with full-adder circuits connected in
cascade, with the output carry from one full-adder connected to the input
carry of the next full-adder. Fig.2.6 shows the interconnections of four full-
adders (FA) to provide a 4-bit binary adder. The augend bits of A and the
addend bits of B are designated by subscript numbers from right to left, with
subscript 0 denoting the low-order bit. The carries are connected in a chain
through the full-adders. The input carry to the binary adder is C0 and the
output carry is C4. The S outputs of the full-adders generate the required
sum bits.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 52
Fig. 2.6: 4-bit binary adder
An n-bit binary adder requires n full-adders. The output carry from each full-
adder is connected to the input carry of the next-high-order full-adder. The n
data bits for the A inputs come from one register (such as R1), and the n
data bits for the B inputs come from another register (such as R2). The sum
can be transferred to a third register or to one of the source registers (R1 or
R2), replacing its previous content.
Binary Adder-Subtractor
The subtraction of binary numbers can be done most conveniently by
means of complements. Remember that the subtraction, A – B can be done
by taking the 2’s complement of B and adding it to A. The 2’s complement
can be obtained by taking the 1’s complement and adding one to the least
significant pair of bits. The 1’s complement can be implemented with
inverters and a one can be added to the sum through the input carry.
The addition and subtraction operations can be combined into one common
circuit by including an exclusive-OR gate with each full-adder. A 4-bit adder-
subtractor circuit is shown in Fig.2.7. The mode input M controls the
operation. When M = 0 the circuit is an adder and when M = 1 the circuit
becomes a subtractor. Each exclusive-OR gate receives input M and one of
the inputs of B. When M = 0, we have B 0 = B. The full-adders receive
the value of B, the input carry is 0, and the circuit performs A plus B. When
M = 1, we have B 1 = B’ and C0 = 1. The B inputs are all complemented
and a 1 is added through the input carry. The circuit performs the operation
A plus the 2’s complement of B. For unsigned numbers, this gives A – B if
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 53
A ≥ B or the 2’s complement of (B – A) if A < B. For signed numbers, the
result is A – B provided that there is no overflow.
Fig. 2.7: 4-bit adder-subtractor
Binary Incrementer
The increment microoperation adds one to a number in a register. For
example, if a 4-bit register has a binary value 0110, it will go to 0111 after it
is incremented. This microoperation is easily implemented with a binary
counter. Every time the count enable is active, the clock pulse transition
increments the content of the register by one. There may be occasions
when the increment microoperation must be done with a combinational
circuit independent of a particular register. This can be accomplished by
means of half-adders connected in cascade.
The diagram of a 4-bit combinational circuit incrementer is shown in Fig.2.8.
One of the inputs to the least significant half-adder (HA) is connected to
logic-1 and the other input is connected to the least significant bit of the
number to be incremented. The output carry from one half-adder is
connected to one of the inputs of the next-higher-order half-adder. The
circuit receives the four bits from A0 through A3, adds one to it, and
generates the incremented output in S0 through S3. The output carry C4 will
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 54
be 1 only after incrementing binary 1111. This also causes outputs S0
through S3 to go to 0.
Fig. 2.8: 4-bit binary incrementer
The circuit of Fig.2.8 can be extended to an n-bit binary incrementer by
extending the diagram to include n half-adders. The least significant bit must
have one input connected to logic-1. The other inputs receive the number to
be incremented or the carry from the previous stage.
Arithmetic Circuit
The arithmetic microoperations listed in Table 2.3 can be implemented in
one composite arithmetic circuit. The basic component of an arithmetic
circuit is the parallel adder. By controlling the data inputs to the adder, it is
possible to obtain different types of arithmetic operations.
The diagram of a 4-bit arithmetic circuit is shown in Fig.2.9. It has four full-
adder circuits that constitute the 4-bit adder and four multiplexers for
choosing different operations. There are two 4-bit inputs A and B and a
4-bit output D. The four inputs from A go directly to the X inputs of the binary
adder. Each of the four inputs from B is connected to the data inputs of the
multiplexers. The multiplexers data inputs also receive the complement of B.
The other two data inputs are connected to logic-0 and logic-1. Logic-0 is a
fixed voltage value (0 volts for TTL integrated circuits) and the logic-1 signal
can be generated through an inverter whose input is 0. The four multiplexers
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 55
are controlled by two selection inputs, S1 and S0. The input carry Cin goes to
the carry input of the FA in the least significant position. The other carries
are connected from one stage to the next.
Fig. 2.9: 4-bit arithmetic circuit
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 56
The output of the binary adder is calculated from the following arithmetic
sum:
D = A + Y + Cin
where A is the 4-bit binary number at the X inputs and Y is the 4-bit binary
number at the Y inputs of the binary adder. Cin is the input carry, which can
be equal to 0 or 1. Note that the symbol + in the equation above denotes an
arithmetic plus. By controlling the value of Y with the two selection inputs S1
and S0 and making Cin equal to 0 or 1, it is possible to generate the eight
arithmetic microoperations listed in Table 2.4.
Select Input Output Microoperation
S1 S0 Cin Y D=A+Y+Cin
0 0 0 B D=A+B Add
0 0 1 B D=A+B+1 Add with carry
0 1 0 B D=A+B Subtract with borrow
0 1 1 B D=A+B +1 Subtract
1 0 0 0 D=A Transfer A
1 0 1 0 D=A+1 Increment A
1 1 0 1 D=A-1 Decrement A
1 1 1 1 D=A Transfer A
Table 2.4: Arithmetic Circuit Function Table
When S1S0 = 00, the value of B is applied to the Y inputs of the adder. If
Cin = 0, the output D = A + B. If Cin = 1, output D = A + B + 1. Both cases
perform the add microoperation with or without adding the input carry.
When S1S0 = 01, the complement of B is applied to the Y inputs of the
adder. If Cin = 1, then D = A + B + 1. This produces A plus the 2’s
complement of B, which is equivalent to a subtraction of A – B. When
Cin = 0, then D = A + B . This is equivalent to a subtract with borrow, that is,
A – B – 1.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 57
When S1S0 = 10, the inputs from B are neglected, and instead, all 0’s are
inserted into the Y inputs. The output becomes D = A + 0 + Cin. This gives
D = A when Cin = 0 and D = A + 1 when Cin = 1. In the first case we have a
direct transfer from input A to output D. In the second case, the value of A is
incremented by 1.
When S1S0 = 11, all 1’s are inserted into the Y inputs of the adder to
produce the decrement operation D = A – 1 when Cin = 0. This is because a
number with all 1’s is equal to the 2’s complement of 1 (the 2’s complement
of binary 0001 is 1111). Adding a number A to the 2’s complement of 1
produces F = A + 2’s complement of 1 = A – 1. When Cin = 1, then
D = A – 1 + 1 = A, which causes a direct transfer from input A to output D.
Note that the microoperation D = A is generated twice, so there are only
seven distinct microoperations in the arithmetic circuit.
2.6 Logic Microoperations
Logic microoperations specify binary operations for strings of bits stored in
registers. These operations consider each bit of the register separately and
treat them as binary variables. For example, the exclusive-OR
microoperation with the contents of two registers R1 and R2 is symbolized
by the statement
P: R1 R1 R2
It specifies a logic microoperation to be executed on the individual bits of the
registers provided that the control variable P = 1. As a numerical example,
assume that each register has four bits. Let the content of R1 be 1010 and
the content of R2 be 1100. The exclusive-OR microoperation stated above
symbolizes the following logic computation:
1010 Content of R1
1100 Content of R2
0110 Content of R1 after P = 1
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 58
The content of R1, after the execution of the microoperation, is equal to the
bit-by-bit exclusive-OR operation on pairs of bits in R2 and previous values
of R1. The logic microoperations are seldom used in scientific computations,
but they are very useful for bit manipulation of binary data and for making
logical decisions.
Special symbols will be adopted for the logic microoperations OR, AND, and
complement, to distinguish them from the corresponding symbols used to
express Boolean functions. The symbol V will be used to denote an OR
microoperation and the symbol ʌ to denote an AND microoperation. The
complement microoperation is the same as the 1’s complement and uses a
bar on top of the symbol that denotes the register name. By using different
symbols, it will be possible to differentiate between a logic microoperation
and a control (or Boolean) function. Another reason for adopting two sets of
symbols is to be able to distinguish the symbol +, when used to symbolize
an arithmetic plus, from a logic OR operation. Although the + symbol has
two meanings, it will be possible to distinguish between them by noting
where the symbol occurs. When the symbol + occurs in a microoperation, it
will denote an arithmetic plus. When it occurs in a control (or Boolean)
function, it will denote an OR operation. We will never use it to symbolize
an OR microoperation. For example, in the statement
P + Q: R1 R2 + R3, R4 R5 V R6
the + between P and Q is an OR operation between two binary variables of
a control function. The + between R2 and R3 specifies an add
microoperation. The OR microoperation is designated by the symbol V
between registers R5 and R6.
List of Logic Microoperations
There are 16 different logic operations that can be performed with two
binary variables. They can be determined from all possible truth tables
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 59
obtained with two binary variables as shown in Table 2.5. In this table, each
of the 16 columns F0 through F15 represents a truth table of one possible
Boolean function for the two variables x and y. Note that the functions are
determined from the 16 binary combinations that can be assigned to F.
x y F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Table 2.5: Truth Tables for 16 Functions of Two Variables
The 16 Boolean functions of two variables x and y are expressed in
algebraic form in the first column of Table 2.6. The 16 logic microoperations
are derived from these functions by replacing variable x by the binary
content of register A and variable y by the binary content of register B. It is
important to realize that the Boolean functions listed in the first column of
Table 2.6 represent a relationship between two binary variables x and y.
The logic microoperations listed in the second column represent a
relationship between the binary content of two registers A and B. Each bit
of the register is treated as a binary variable and the microoperation is
performed on the string of bits stored in the registers.
Boolean function Microoperation Name
F0 = 0 F 0 Clear
F1 = xy F A ʌ B AND
F2 = xy’ F A ʌ B
F3 = x F A Transfer A
F4 = x’y F A ʌ B
F5 = y F B Transfer B
F6 = x y F A B Exclusive-OR
F7 = x + y F A V B OR
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 60
F8 = (x + y)’ F AVB NOR
F9 = (x y)’ F BA Exclusive-NOR
F10 = y’ F B Complement B
F11 = x + y’ F A V B
F12 = x’ F A Complement A
F13 = x’ + y F A V B
F14 = (xy)’ F BA NAND
F15 = 1 F all 1s Set to all 1s
Table 2.6: Sixteen Logic Microoperations
Hardware Implementation
The hardware implementation of logic microoperations requires that logic
gates be inserted for each bit or pair of bits in the registers to perform the
required logic function. Although there are 16 logic microoperations, most
computers use only four – AND, OR, XOR (exclusive-OR), and
complement–from which all others can be derived.
Fig. 2.10 shows one stage of a circuit that generates the four basic logic
microoperations. It consists of four gates and a multiplexer. Each of the four
logic operations is generated through a gate that performs the required
logic. The outputs of the gates are applied to the data inputs of the
multiplexer. The two selection inputs S1 and S0 choose one of the data
inputs of the multiplexer and direct its value to the output. The diagram
shows one typical stage with subscript i. For a logic circuit with n bits, the
diagram must be repeated n times for I = 0, 1, 2, …, n-1. The selection
variables are applied to all stages. The function table in Fig.2.10 (b) lists the
logic microoperations obtained for each combination of the selection
variables.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 61
Fig. 2.10: One stage of logic circuit
Some Applications
Logic microoperations are very useful for manipulating individual bits or a
portion of a word stored in a register. They can be used to change bit
values, delete a group of bits, or insert new bit values into a register. The
following examples show how the bits of one register (designated by A) are
manipulated by logic microoperations as a function of the bits of another
register (designated by B). In a typical application, register A is a processor
register and the bits of register B constitute a logic operand extracted from
memory and placed in register B.
The selective-set operation sets to 1 the bits in register A where there are
corresponding 1’s in register B. It does not affect bit positions that have 0’s
in B. The following numerical example clarifies this operation:
1010 A before
1100 B (logic operand)
1110 A after
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 62
The two leftmost bits of B are 1’s, so the corresponding bits of A are set to
1. One of these two bits was already set and the other has been changed
from 0 to 1. The two bits of A with corresponding 0’s in B remain
unchanged. The example above serves as a truth table since it has all four
possible combinations of two binary variables. From the truth table we note
that the bits of A after the operation are obtained from the logic-OR
operation of bits in B and previous values of A. Therefore, the OR
microoperation can be used to selectively set bits of a register.
The selective-complement operation complements bits in A where there are
corresponding 1’s in B. It does not affect bit positions that have 0’s in B.
For example:
1010 A before
1100 B (logic operand)
0110 A after
Again the two leftmost bits of B are 1’s, so the corresponding bits of A are
complemented. This example again can serve as a truth table from which
one can deduce that the selective-complement operation is just an
exclusive-OR microoperation. Therefore, the exclusive-OR microoperation
can be used to selectively complement bits of a register.
The selective-clear operation clears to 0 the bits in A only where there are
corresponding 1’s in B. For example:
1010 A before
1100 B (logic operand)
0010 A after
Again the two leftmost bits of B are 1’s, so the corresponding bits of A are
cleared to 0. One can deduce that the Boolean operation performed on the
individual bits is AB’. The corresponding logic microoperation is
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 63
A A ˄ B
The mask operation is similar to the selective-clear operation except that the
bits of A are cleared only where there are corresponding 0’s in B. The mask
operation is an AND micro operation as seen from the following numerical
example:
1010 A before
1100 B (logic operand)
1000 A after masking
The two rightmost bits of A are cleared because the corresponding bits of B
are 0s. The two leftmost bits are left unchanged because the corresponding
bits of B are 1s. The mask operation is more convenient to use than the
selective-clear operation because most computers provide an AND
instruction, and few provide an instruction that executes the microoperation
for selective-clear.
The insert operation inserts a new value into a group of bits. This is done by
first masking the bits and then ORing them with the required value. For
example, suppose that an A register contains eight bits, 0110, 1010. To
replace the four leftmost bits by the value 1001 we first mask the four
unwanted bits:
0110 1010 A before
0000 1111 B (mask)
0000 1010 A after masking
and then insert the new value:
0000 1010 A before
1001 0000 B (insert)
1001 1010 A after insertion
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 64
The mask operation is an AND microoperation and the insert operation is an
OR microoperation.
The clear operation compares the words in A and B and produces an all 0s
result if the two numbers are equal. This operation is achieved by an
exclusive-OR microoperation as shown by the following example:
1010 A
1010 B
0000 A A B
When A and B are equal, the two corresponding bits are either both 0 or
both 1. In either case the exclusive-OR operation produces a 0. The all-0s
result is then checked to determine if the two numbers were equal.
2.7 Shift Microoperations
Shift microoperations are used for serial transfer of data. They are also used
in conjunction with arithmetic, logic, and other data-processing operations.
The contents of a register can be shifted to the left or the right. At the same
time that the bits are shifted, the first flip-flop receives its binary information
from the serial input. During a shift-left operation the serial input transfers a
bit into the rightmost position. During a shift-right operation the serial input
transfers a bit into the leftmost position. The information transferred through
the serial input determines the type of shift. There are three types of shifts:
logical, circular, and arithmetic.
A logical shift is one that transfers 0 through the serial input. We will adopt
the symbols shl and shr for logical shift-left and shift-right microoperations.
For example:
R1 shl R1
R2 shr R2
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 65
are two microoperations that specify a 1-bit shift to the left of the content of
register R1 and a 1-bit shift to the right of the content of register R2. The
register symbol must be the same on both sides of the arrow. The bit
transferred to the end position through the serial input is assumed to be 0
during a logical shift.
The circular shift (also known as a rotate operation) circulates the bits of the
register around the two ends without loss of information. This is
accomplished by connecting the serial output of the shift register to its serial
input. We will use the symbols cil and cir for the circular shift left and right,
respectively. The symbolic notation for the shift microoperations is shown in
Table 2.7.
Symbolic designation Description
R shl R Shift-left register R
R shr R Shift-right register R
R cil R Circular shift-left register R
R cir R Circular shift-right register R
R ashl R Arithmetic shift-left R
R ashr R Arithmetic shift-right R
Table 2.7: Shift Microoperations
An arithmetic shift is a microoperation that shifts a signed binary number to
the left or right. An arithmetic shift-left multiplies a signed binary number by
2. An arithmetic shift-right divides the number by 2. Arithmetic shifts must
leave the sign bit unchanged because the sign of the number remains the
same when it is multiplied or divided by 2. The leftmost bit in a register holds
the sign bit, and the remaining bits hold the number. The sign bit is 0 for
positive and 1 for negative. Negative numbers are in 2’s complement form.
Fig.2.11 shows a typical register of n bits. Bit Rn-1 in the leftmost position
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 66
holds the sign bit. Rn-2 is the most significant bit of the number and R0 is the
least significant bit. The arithmetic shift-right leaves the sign bit unchanged
and shifts the number (including the sign bit) to the right. Thus Rn-1 remains
the same, Rn-2 receives the bit from Rn-1, and so on for the other bits in the
register. The bit in R0 is lost.
Rn-1
Fig. 2.11: Arithmetic shift right
The arithmetic shift-left inserts a 0 into R0, and shifts all other bits to the left.
The initial bit of Rn-1 is lost and replaced by the bit from Rn-2. A sign reversal
occurs if the bit in Rn-1 changes in value after the shift. This happens if the
multiplication by 2 causes an overflow. An overflow occurs after an
arithmetic shift left if initially, before the shift, Rn-1 is not equal to Rn-2. An
overflow flip-flop Vs can be used to detect an arithmetic shift-left overflow.
Vs = Rn-1 Rn-2
If Vs = 0, there is no overflow, but if Vs = 1, there is an overflow and a sign
reversal after the shift. Vs must be transferred into the overflow flip-flop with
the same clock pulse that shifts the register.
Hardware Implementation
A possible choice for a shift unit would be a bidirectional shift register with
parallel load. Information can be transferred to the register in parallel and
then shifted to the right or left. In this type of configuration, a clock pulse is
needed for loading the data into the register, and another pulse is needed to
initiate the shift. In a processor unit with many registers it is more efficient to
implement the shift operation with a combinational circuit. In this way the
Rn-1 Rn-2 R0 R1
Sign bit
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 67
content of a register that has to be shifted is first placed onto a common bus
whose output is connected to the combinational shifter, and the shifted
number is then loaded back into the register. This requires only one clock
pulse for loading the shifted value into the register.
A combinational circuit shifter can be constructed with multiplexers as
shown in Fig.2.12. The 4-bit shifter has four data inputs. A0 through A3, and
four data outputs, H0 through H3. There are two serial inputs, one for shift
left (IL) and the other for shift right (IR). When the selection input S=0, the
input data are shifted right (down in the diagram). When S=1, the input data
are shifted left (up in the diagram). The function table in Fig.2.12 shows
which input goes to each output after the shift. A shifter with n data inputs
and outputs requires n multiplexers. The two serial inputs can be controlled
by another multiplexer to provide the three possible types of shifts.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 68
Fig. 2.12: 4-bit combinational circuit shifter
2.8 Arithmetic Logic Shift Unit
Instead of having individual registers performing the microoperations
directly, computer systems employ a number of storage registers connected
to a common operational unit called an arithmetic logic unit, abbreviated
ALU. To perform a microoperation, the contents of specified registers are
placed in the inputs of the common ALU. The ALU performs an operation
and the result of the operation is then transferred to a destination register.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 69
The ALU is a combinational circuit so that the entire register transfer
operation from the source registers through the ALU and into the destination
register can be performed during one clock pulse period. The shift
microoperations are often performed in a separate unit, but sometimes the
shift unit is made part of the overall ALU.
The arithmetic, logic, and shift circuits introduced in previous sections can
be combined into one ALU with common selection variables. One stage of
an arithmetic logic shift unit is shown in Fig.2.13. The subscript I designates
a typical stage. Inputs Ai and Bi are applied to both the arithmetic and logic
units. A particular microoperation is selected with inputs S1 and S0. A 4 x 1
multiplexer at the output chooses between an arithmetic output in Ei and a
logic output in Hi. The data in the multiplexer are selected with inputs S3
and S2. The other two data inputs to the multiplexer receive inputs Ai-1 for
the shift-right operation and Ai+1 for the shift-left operation. Note that the
diagram shows just one typical stage. The circuit of Fig.2.13 must be
repeated n times for an n-bit ALU. The output carry Ci+1 of a given
arithmetic stage must be connected to the input carry Ci of the next stage in
sequence. The input carry to the first stage is the input carry Cin, which
provides a selection variable for the arithmetic operations.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 70
Fig. 2.13: One stage of arithmetic logic shift unit
The circuit whose one stage is specified in Fig.2.13 provides eight arithmetic
operation, four logic operations, and two shift operations. Each operation is
selected with the five variables S3, S2, S1, S0, and Cin. The input carry Cin is
used for selecting an arithmetic operation only.
Table 2.8 lists the 14 operations of the ALU. The first eight are arithmetic
operations and are selected with S3S2=00. The next four are logic
operations and are selected with S3S2=01. The input carry has no effect
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 71
during the logic operations and is marked with don’t-care x’s. The last two
operations are shift operations and are selected with S3S2=10 and 11. The
other three selection inputs have no effect on the shift.
Operation Select Operation Function
S3 S2 S1 S0 Cin
0 0 0 0 0 F = A Transfer A
0 0 0 0 1 F = A + 1 Increment A
0 0 0 1 0 F = A + B Addition
0 0 0 1 1 F = A + B + 1 Add with carry
0 0 1 0 0 F = A + B Subtract with borrow
0 0 1 0 1 F = A + B + 1 Subtraction
0 0 1 1 0 F = A – 1 Decrement A
0 0 1 1 1 F = A Transfer A
0 1 0 0 X F = A ʌ B AND
0 1 0 1 X F = A V B OR
0 1 1 0 X F = A B XOR
0 1 1 1 X F = A Complement A
1 0 X X X F = shr A Shift right A into F
1 1 X X X F = shl A Shift left A into F
Table 2.8: Function Table for Arithmetic Logic Shift Unit
2.9 Summary
This unit dealt with Register transfer language, Bus selection, Arithmetic
circuits, Logical circuits and Arithmetic logic shift unit in detail.
Self Assessment Questions
1. The operations executed on data stored in registers are called
__________.
2. The symbolic notation used to describe the microoperation transfers
among registers is called a __________________.
3. The register that holds an address for the memory unit is usually called
a _________________.
Computer Organization and Architecture Unit 2
Sikkim Manipal University - DDE Page No. 72
4. It is convenient to separate the control variables from the register
transfer operation by specifying a ______________.
5. A more efficient scheme for transferring information between registers in
a multiple-register configuration is a ______________.
2.10 Terminal Questions
1. What is Register Transfer Language? Explain.
2. Give and explain the block diagram of the bus system with four
registers.
3. Explain the various arithmetic microoperations performed in registers.
4. Give and explain one stage of logic circuit.
5. Give and explain one stage of arithmetic logic shift unit.
2.11 Answers
Self Assessment Questions:
1. microoperations
2. register transfer language (RTL)
3. memory address register (MAR)
4. control function
5. common bus system
Terminal Questions:
1. Refer Section 2.2
2. Refer Section 2.4
3. Refer Section 2.5
4. Refer Section 2.6
5. Refer Section 2.8
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 73
Unit 3 Basic Structure of a Digital Computer
Structure:
3.1 Introduction
Objectives
3.2 Mechanical and Electromechanical Ancestors
3.3 Structure of a Computer System
3.4 Arithmetic Logic Unit
3.5 Control Unit
3.6 Bus Structure
3.7 Von Neumann Architecture
3.8 Summary
3.9 Terminal Questions
3.10 Answers
3.1 Introduction
A computer is a digital device that is capable of computing and processing
data. Computers run instructions that are given to it. Instructions are low-
level commands like add, subtract, or compare two numbers, and run some
instruction if the comparison is true, and run some other instruction if it's
false.
The list of instructions is called program and internal storage is called
memory. Information fed to a computer can be categorized as either
instruction or data. Instructions are the explicit commands that govern
the transfer of information within the computer as well as between the
computer and its I/O devices and specify the operations to be
performed.
A computer is defined as a machine for manipulating data according to a list
of instructions. Earlier the term "computer" referred to a person who
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 74
performed numerical calculations often with the aid of a mechanical
calculating device.
Objectives:
By the end of Unit 3, you should be able to:
1. Define mini computers, micro computers, main frames and super
computers.
2. Explain the basic structure of the computer.
3. Discuss the Central Processing Unit of the computer
4. Explain the bus or interconnection system.
3.2 Mechanical and Electromechanical ancestors
The first machine to attract widespread attention was built in 1642, by a
French philosopher and scientist B. Pascal. This machine was a
mechanical counter for carrying operations like addition and subtraction. It
consisted of two sets of gears with teeth (counter wheels) for representing
decimal numbers. Each gear had 10 decimal digits engraved on it. The
position of the dial indicated the decimal value. Each set of dials was used
to temporarily hold a number like a register. One special register was
considered like an accumulator which held the running total. There was one
more register which was used to enter a number to be added or subtracted
from the accumulator. Thus when the machine was set in motion, the
numbers in the two sets of dials were added, with the result appearing in the
accumulator. Two main technical innovations were:
1. A ratchet device to transfer a carry automatically from one place to next.
2. A means of storing a negative number referred as complement
representation.
By means of complement number representation the machine could
accomplish both addition and subtraction.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 75
In 1671, the German mathematician and philosopher Gottfried Leibniz
constructed a calculator that could perform multiplication and division along
with addition and subtraction. Leibniz‟s machine could perform additional
operations in a repetitive step by step fashion using chains and pulleys. This
machine was the forerunner of many machines that was also called as four
function calculator.
In 1750, the punch cards were developed to specify the pattern in the
technology of weaving. Symbols stored on punch cards were used to
represent instructions. There were series of modifications and finally in 1801,
Joseph Marie Jacquard made an improvement to the textile loom. The
machine used a series of punched paper cards. He produced a very
successful loom in which all the power was supplied mechanically and all
the control through the punch card. And the cards were moved through loom
apparatus. The presence or absence of a hole dictated the movements of
parts of the loom to create the desired pattern. Thus this loom was a
programmable process control machine with the „program‟ supplied on
punch cards were used as a template to allow his loom to weave intricate
patterns automatically.. The resulting Jacquard loom was an important step
in the development of computers because the use of punched cards to
define woven patterns can be viewed as an early, albeit limited, form of
programmability.
The ideas behind the general purpose programmable computer were
developed by Charles Babbage in the 19th century. He designed two
machines: Difference Engine and Analytical Engine.
Difference Engine: The difference engine was designed to calculate the
entries of a table automatically and transfer them via steel punches to an
engineer‟s plate from which the tables could be printed. The only arithmetic
operation performed by this machine was addition. However using a
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 76
mathematical technique called method of finite difference, a large number
of useful formulas could be computed either exactly or approximately using
only additions. The formulas included polynomials and trigonometric
functions like sine functions. The structure of Babbage‟s Difference Engine
is as shown in figure 3.1 below.
Initial values
Register
Adder
Register
Adder
Register
Adder
Register
Figure 3.1: Structure of Babbage’s Difference Engine
The difference engine consisted of a number of mechanical registers. Each
register consisted of a set of counter wheels and could store a decimal
number. Pairs of adjacent registers were connected by an adding
mechanism. To compute results, initial values were loaded into the registers.
When the difference engine was driven by a suitable motor it could then
perform a series of steps to give an answer. This machine could
accommodate third degree polynomials and 15-digit numbers. Babbage
proposed to build a machine that would accommodate sixth degree
polynomials and 20-digit numbers but could not succeed as the British
government withdrew its support. In the mean time he came up with a more
powerful and ambitious machine which he called an Analytical engine.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 77
Analytical engine: This machine was designed to be a general purpose
device that is capable of performing any mathematical operation
automatically. The Analytical engine‟s structure is as shown in Figure 3.2
below.
The store
(Memory)
The Mill
{Arithmetic
Function)
Printer
&
Punch card
Variable
cards
Operation
cards
program
Instructions
Figure 3.2: Structure of Babbage’s analytical engine
The Analytical engine consists of five units of which two main units are the
Store and the Mill. To control the sequence of operations, punch cards
were used which were of the type developed earlier for Jacquard‟s loom.
Each of these components are described below:
The Store: Is a memory unit comprising sets of counter wheels
The Mill: It corresponds to a modern Arithmetic Logic Unit. It was
capable of performing four basic arithmetic operations. It operated on
pairs of mechanical registers and produced a result stored in another
mechanical register, all of which were stored in the store (a memory as
defined above)
The punch cards: It constituted a computer program and were divided
into two groups:
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 78
o 90 Operation cards: These cards were used to control the
operation of the Mill. Each operation card selected one of the four
possible operations (that is addition, subtraction, multiplication and
division) at each step in the program.
o Variable cards: It is used to select the memory locations to be used
by a particular operation, that is the source of the input operands and
the destination of the results.
Output: Is a printer or a card punch device so that output data either
printed on a printer or punched on cards.
Large-scale automated data processing of punched cards was performed
for the US Census in 1890 by tabulating machines designed by Herman
Hollerith and manufactured by the Computing Tabulating Recording
Corporation, which later became IBM.
During the first half of the 20th century, many scientific computing needs
were met by increasingly sophisticated analog computers. However, these
were not programmable and generally lacked the versatility and the
accuracy of modern digital computers.
Computers take numerous physical forms. Early electronic computers were
the size of a large room, consuming as much power as several hundred
modern personal computers. Today, computers can be made small enough
to fit into a wrist watch and be powered from a watch battery. Society has
come to recognize personal computers and, the laptop computer as icons of
the information age. However, the most common form of computer in use
today is by far the embedded computer. Embedded computers are small,
simple devices that are often used to control other devices. For example,
they may be found in machines ranging from fighter aircraft to industrial
robots, digital cameras, and even children's toys.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 79
The ability to store and execute programs makes computers extremely
versatile. Computers can range from personal computers to supercomputers
which are all able to perform same computational tasks so long as time and
storage capacity are not considerations. The use of digital electronics and
more flexible programmability were important steps, in the development of
digital electronic computer.
EDSAC (Electronic Delay Storage Automatic Calculator) was one of the
first computers to implement the stored program (von Neumann)
architecture. Several developers came up with a far more flexible and
elegant design, which came to be known as the stored program
architecture or von Neumann architecture. This design was first formally
described by John von Neumann in the paper "First Draft of a Report on the
EDVAC", published in 1945. Nearly all modern computers implement some
form of the stored program architecture.
The modern definition of a computer is an electronic device that
performs calculations on data, presenting the results to humans or
other computers in a variety of useful ways. A computer is a complex
system that contains millions of elementary electronic components. Hence
computer can be considered to have hierarchic nature. A hierarchic system
is a set of interrelated subsystems each again hierarchical in nature until we
reach some lowest level of elementary subsystem. A hierarchic nature of a
complex system is essential to both their design and their description. At
each level the designer is concerned with the structure and function.
Definitions of few terms used in this course
Structure: It defines the way in which the components of a computer
are interrelated.
Function: It defines the operation of each individual component as a
part of the structure.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 80
Architecture: It refers to those attributes of a computer system which
are visible to a programmer.
Organization: It refers to the operational units of a computer and their
interconnections, and how they implement the architecture of the system.
Classification of Computers
Computers can be classified into many types based on their size, speed
and cost.
The smallest machines are called as microcomputers. For example a
personal computer, that is widely used in schools, homes and business
offices.
The next higher level machine is called a minicomputer. Minicomputers
are widely used in payroll, scientific computing applications etc.
Mainframes are used for business data processing, when computing
and storage capacity is larger than what the minicomputers can handle.
Supercomputers are used for large scale numerical calculations such
as weather forecasting, missile launching, aircraft and simulation.
We usually describe the computer system using top down approach which is
clearest and most effective. In this unit we describe the structure and
function of the major components of a computer system, and then proceed
successively to the lower layers of the hierarchy in the next coming units of
this book.
3.3 Structure of a Computer System
A computer is an entity that interacts in some or the other way with its
external environment. All of its linkages to external environment are
classified as peripheral devices or communication lines. In particular,
the basic model of a computer is as shown in figure 3.3. It consists of four
main components, Central Processing Unit (CPU), Memory, Input /
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 81
Output and a Bus. Bus can also be a wire or a communication line or in
general it can be referred to as a system interconnection.
Bus
Interface
Input
Output
I/O Interface
Memory
Processor
ALU
Control
CPU
Keyboard
CRT
DISK
Network
Figure 3.3: Functional units of a computer
CPU: This is the computational unit and is the computer's heart. This
entity controls the operations of the computer and performs its data
processing functions. It is usually referred to as a processor. All the
actions of all components are controlled by the control unit of CPU.
CPU comprises two units called Arithmetic Logic Unit (ALU), where all
the arithmetic and logical operations are performed and the Control
Unit (CU) which coordinates with all other units for proper system
operation.
Memory: Memory is used to store the instructions, data and the result
as well. Memory unit is an integral part of a computer system. The main
function of a memory unit is to store the information needed by the
system.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 82
Input/output interface: They are used to move data from the computer
and from its external environment. The external environment may be an
input or an output device like Printer, displays, keyboard etc.
System interconnection: This constitutes some mechanism that
provide for communication among CPU, Memory & I/O. These can be
referred to as a system BUS.
Traditionally the computer system consists of a single CPU. But some
machines like multiprocessing involves the use of multiple CPU‟s and share
a single memory.
Central Processing Unit
It is the heart or core component of CPU. Figure 3.4 shows the basic
functional components of Central Processing Unit.
Figure 3.4: Basic Block diagram of CPU
Its major structural components are Control unit, ALU, Registers and CPU
interconnections.
Control Unit: It controls the operations of the CPU and hence the
computer system.
Arithmetic logic unit (ALU): It performs the computers data processing
functions
Registers
Contol Unit
ALU
Internal CPU
interconnection
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 83
Registers: It forms the internal memory for CPU. It provides storage,
internal to the CPU.
CPU interconnections: It provides means for communication among
the control unit, ALU and registers of the CPU.
The CPU must carry out the tasks given below:
1. Read the given instructions
2. Decode them
3. Get operands for execution
4. Process the instruction
5. Give out / store the result
To carry out these tasks the CPU needs to temporarily store some data. It
must remember the location of the last instruction so that it can know where
to get the next instruction. It needs to store instruction and data temporarily
while an instruction is being executed. In general the CPU needs an internal
memory for all store, either instruction or data.
The CPU contains a handful of registers which act like local variables. The
CPU runs instructions and performs computations, mostly by the ALU. The
registers are the only memory the CPU has. Register memory is very fast
for the CPU to access, since it resides in the CPU itself.
However, the CPU has rather limited memory. All the local memory it uses
is in registers. It has very fast access to registers, which are on-board on the
CPU. It has much slower access to RAM.
Arithmetic logic unit (ALU)
The ALU is a collection of logic circuits designed to perform arithmetic
(addition, subtraction, multiplication, and division) and logical operations
(not, and, or, and exclusive-or). It's basically the calculator of the CPU.
When an arithmetic or logical operation is required, the values and
command are sent to the ALU for processing.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 84
Control Unit:
The purpose of control unit is to control the system operations by routing
the selected data items to the selected processing hardware at the right
time. Control unit acts as nerve centre for the other units.
Instruction decoder:
All instructions are stored as binary values. The instruction decoder
receives the instruction from memory, interprets the value to see what
instruction is to be performed, and tells the ALU and the registers which
circuits to energize in order to perform the function.
Registers:
The registers are used to store the data, addresses, and flags that are in
use, by the CPU.
Memory Units
Memory is basically a large array of bytes. The main function of a memory
unit is to store the information needed by the system. Information stored can
be data, an instruction that is nothing but programs and may be some
garbage. Memory locations that do not contain any valid data may store
some arbitrary values and hence they are termed as garbage data. Memory
unit is an integral part of a computer system.
The system performance is largely dependent on the organization, storage
capacity and speed of operation of the memory system. The CPU can read
or write to the memory, but it's much slower than accessing registers.
Nevertheless, you need memory because registers simply hold too little
information.
Most of the memory is in RAM, which can be thought of as a large array of
bytes as shown in figure 3.5. In an array, we can refer to individual elements
using an index. In computer organization, indexes are more commonly
referred to as addresses. Addresses are the numbers used to identify
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 85
successive locations. Specifying its address and a command that performs
the storage or retrieval process can access a word.
Figure 3.5: System showing the registers and memory as array of bytes
The number of bits in each word is called as word length of the computer.
Large computers usually have 32 or more bits in a word. Word length of
microcomputers ranges from 8 to 32 bits. The capacity of the memory is one
factor that decides the size of the computer. Data are usually manipulated
within the machine in units of words, multiples of words or parts of words.
During execution the program must reside in the main memory. Instructions
and data are written into the memory or read out from the memory under the
control of a processor. Most memory is byte-addressable, meaning that
each address refers to one byte of memory.
The bulk of the memory is stored in a separate device called RAM usually
called physical memory. RAM stores programs as well as data. The CPU
fetches an instruction in RAM to a register, which is referred as an
instruction register. This register then determines what instruction it has, and
executes the instruction. Executing the instruction may require loading data
from RAM to the CPU or storing data from the CPU to RAM.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 86
The time required to access one word is called the memory access time.
Classification of Memory system of a Computer
Memory system of a computer can be broadly classified into four groups.
Internal Memory
Internal memory refers to a set of CPU registers. These serve as
working memory, storing temporary results during the computation
process. They form a general purpose register file for storing the data as
it is processed. Since the cost of these registers is very high only few
registers can be used in the CPU.
Primary Memory
Primary memory is also called as main memory, which operates at
electronic speeds. CPU can directly access the program stored in the
primary memory. Main memory consists of a large number of
semiconductor storage cells. Each cell is capable of storing one bit of
information. Word is a group of these cells. Main memory is organized
so that the contents of one word, containing n bits, can be stored or
retrieved in one basic operation.
Secondary Memory
This memory type is much larger in capacity and also much slower than
the main memory. Secondary memory stores system programs, large
data files and the information which is not regularly used by the CPU.
When the capacity of the main memory is exceeded, the additional
information is stored in the secondary memory. Information from the
secondary memory is accessed indirectly through the I/O programs that
transfer the information between the main memory and secondary
memory. Examples for secondary memory devices are magnetic hard
disks and CD-ROMs.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 87
Cache Memory
The performance of a computer system will be severely affected if the
speed disparity between the processor and the main memory is
significant. The system performance can be improved by placing a small,
fast acting buffer memory between the processor and the main memory.
This buffer memory is called as cache memory. Cost of this memory is
very high.
Input / Output and I/O Interface
Any movement of information from or to the computer system is considered
as Input/Output. The CPU and its supporting circuitry provide I/O methods.
Input and output unit is usually combined under the term input-output unit
(I/O). For example consider the keyboard of a video terminal, which consists
of key- board for input and a cathode ray tube display for output.
There are a wide variety of peripherals which deliver different amounts of
data, run at different speeds and present data in different formats. All the I/O
peripherals are slower than CPU and RAM. Hence I/O units need proper
I/O interfaces.
Input Devices:
Computer accepts the coded information through the input unit. It has the
capability of reading the instructions and data to be processed. The most
commonly used input device is the keyboard of a video terminal. This is
electronically connected to the processing part of a computer. The keyboard
is wired such that whenever a key is pressed the corresponding letter or
digit is automatically translated into its corresponding code and is directly
sent either to memory or the processor.
Output Devices
Output unit displays the processed results. Examples are video terminals
and graphic displays.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 88
I/O devices do not alter the information content or the meaning of the data.
Some devices can be used as output only e.g. graphic displays.
Following are the Input / Output techniques
o Programmed
o Interrupt driven
o Direct Memory Access (DMA)
System Interconnection / BUS
“A bus is a communication pathway connecting two or more devices.”
A key characteristic of a bus is that it is a shared transmission medium.
Multiple devices connect to the bus, and a signal transmitted by any one
device is available for reception by all other devices attached to the bus. If
two devices transmit during the same time period, their signals will overlap
and become garbled. Thus, only one device at a time can successfully
transmit. The communication between the external environment and CPU is
established through the System Bus. System bus is classified into three
different types, depending on whether it carries Data, Control, or Address
information and are indicated in figure 3.5.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 89
3.4 ALU
ALU
Shifter
Status Flag
Complementer
Arithmetic &
Boolean logic In
tern
al C
PU
BU
S
Register/
instruction
decoder
Control
Unit
Figure 3.6: CPU showing internal components of ALU
Basic components of ALU are as shown in figure 3.6. The Arithmetic and
Logic Unit is the core of any processor. It performs the calculations on the
input data given. ALU is needed to transfer data between the various
registers. Always ALU operates only on data in the internal CPU memory.
ALU is capable of performing any arithmetic and boolean operations.
An Arithmetic-Logic Unit or ALU can be considered as a combination of
various circuits in a single circuit that are used to execute data processing
instructions. The complexity of ALU is determined by the way in which its
arithmetic instructions are realized. Simple ALUs that perform fixed point
addition and subtraction, as well as logical operations can be realized by
combinational circuits.
The ALU that realized using a combinational logic that are basically
constructed from AND, OR, and NOT gates. It is basically an
implementation of a Boolean function. A generic diagram for a
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 90
combinational logic circuit of ALU is as shown in figure 3.7. In general, a
combinational logic circuit has inputs, which are divided into data inputs and
control inputs, and outputs. Control inputs tell the circuit what to do with the
data inputs.
Figure 3.7: A generic ALU that has 2 inputs and 1 output
A typical ALU will have two input ports and a result port. It will also have a
control input telling it which operation to perform. For example add, subtract,
and, or, etc. It also consists of additional output bits for condition codes.
Basically, these bits indicate some facts about the computation. For
example carry, overflow, negative, zero result. Additional output bits are
together called as the status bits. The status bits are used for branching
operations.
ALUs may be simple and perform only a few operations: Integer arithmetic
like add and subtract and Boolean logic like and, or, complement and left
Shift, right Shift, rotate. Such simple ALUs may be found in small 4- and 8-
bit processors.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 91
Example: Consider a 32 bit ALU as shown in figure 3.8. It consists of
source 1 labeled as SRC1 and source 2 labeled as SRC2 as the two 32-bit
data inputs. It also has a control input, labeled C which is a signal for
addition. These control bits tell the ALU to perform addition operation on the
data inputs.
Figure 3.8: A 32 bit ALU
The result of the computation is sent to the output labeled DST. It is also 32
bits. There are some additional output bits labeled ST. In our example they
may indicate whether the output is zero, or has overflowed.
More complex ALUs will support a wider range of integer operations like
multiplication and division, floating point operations like add, subtract,
multiply, divide. It can even compute mathematical functions like square root,
sine, cosine, log, etc.
To perform arithmetic and logic operations necessary operands are
transferred from the memory location to ALU where one of the operand is
stored temporarily in some register. This register is called temporary
register. Each register stores one word of data.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 92
3.5 Control Unit
The control unit is the portion of the processor that actually causes things to
happen. The purpose of control unit is to control the system operations by
routing the selected data items to the selected processing hardware at the
right time. Control unit acts as nerve centre for the other units. This unit
decodes and translates each instruction and generates the necessary
enable signals for ALU and other units. Control unit has two responsibilities
i.e., instruction interpretation and instruction sequencing.
In instruction interpretation the control unit reads instruction from the
memory and recognizes the instruction type, gets the necessary operand
and sends them to the appropriate functional unit. The signals necessary to
perform desired operation are taken to the processing unit and results
obtained are sent to the specified destination.
In instruction sequencing control unit determines the address of the next
instruction to be executed and loads it into program counter.
In general the I/O transfers are controlled by the software instructions that
identify both the devices involved and the type of transfer. But the actual
timing signals that govern the transfers are generated by the control circuits.
Similarly the data transfer between a processor and the memory is
controlled by the control circuits.
The operation of the computer can be summarized as below:
The computer accepts information through the input unit and transfers it
to the memory.
Information stored in the memory is fetched into arithmetic and logic unit
to perform the desired operations.
Processed information is transferred to the output unit.
All activities inside the machine are controlled by a control unit.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 93
3.6 Bus Structure
A bus consists of 1 or more wires. There's usually a bus that connects the
CPU to memory and to disk and I/O devices. Real computers usually have
several busses, even though the simple computer we have modeled only
has one bus where we consider the data bus, the address bus, and the
control bus as part of one larger bus.
The size of the bus is the number of wires in the bus. We can refer to
individual wires or a group of adjacent wires with subscripts. A bus can be
drawn as a line with a dash across it to indicate there's more than one wire.
The dash in it is then labeled with the number of wires and the designation
of those wires.
Figure 3.9: Representation of a 32 bit Bus
For example, consider a bus as shown in figure 3.9. It consists of a slant
dash on the horizontal line that represents it is a bus that carries more wires.
Also the slant dash is labeled 32 which indicates that the number of wires in
that bus is 32 and the dash is also labeled A31-0 which indicates individual
32 wires from A0 to A31. We can then refer to, say A10-0 or A15-9 to refer to
some subset of the wires.
A bus allows any number of devices to hook up to the bus. Devices
connected to the bus must share the bus. Only one device can write to it at
a time. One alternative to using a bus is to connect each pair of devices
directly. Unfortunately, for N devices, this requires about N2 connections,
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 94
which may be too many. Most devices have a fixed number of connections
which doesn't permit dedicated connections to other devices. A bus doesn't
have this problem.
Data, Address, and Control Busses
There are usually 3 kinds of buses. There's a 32-bit data bus, which is used
to write or read 32 bits of data to or from memory. There's a 32-bit address
bus for the CPU to specify which address to read or write from or to memory.
Finally, there's a control bus which may consist of a single wire or multiple
wires to allow the CPU and memory to communicate
For example a control signal is required to indicate when and whether a
read or write is to be performed. To support two 32-bit busses, both the
CPU and memory require 64 pins or connections 32 for data and 32 for
address. Earlier there was shortage of pins and hence it was necessary to
multiplex the address and data bus. Multiplexing uses the same bus as both
address and data bus.
There are other kinds of busses that are used primarily for I/O devices like
USB. These are mostly high-speed busses for external devices.
3.7 Von Neumann Architecture
Figure 3.10: Structure of the IAS Computer
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 95
IAS is the first digital computer in which the von Neumann Architecture was
employed. The general structure of the IAS computer is as shown in figure
3.10:
A main memory, which stores both instructions and data
An arithmetic and logic unit (ALU) capable of operating on binary data
A control unit, which interprets the instructions in memory and causes
them to be executed
Input and Output (I/O) equipment operated by the control unit
The von Neumann Architecture is based on three key concepts:
1. Data and instructions are stored in a single read-write memory.
2. The content of this memory is addressable by location, without regard to
the type of data contained therein.
3. Execution occurs in a sequential fashion unless explicitly modified from
one instruction to the next.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 96
PC MAR
IR
. . .
. . .
Instruction
Instruction
Instruction
Instruction
. . .
. . .
Data
Data
Data
. . .
MBR
I/O AR
I/O BR
. . .
PC = Program Counter IR = Instruction Register MAR = Memory address register MBR = Memory buffer register I/O AR = I/O address register I/O BR = I/O buffer register
CPU Memory
I/O Module
ALU
Figure 3.11: Computer components Von Neumann architecture
The CPU consists of various registers as listed in figure 3.11. They are
1. Program Counter (PC): It contains an address of an instruction to be
fetched.
2. Instruction Register (IR): It contains the instruction most recently
fetched.
3. Memory Address Registers (MAR): Contains the address of a location
in memory.
4. Memory Buffer Register (MBR): It contains a word of data to be written
to memory or the word most recently used.
5. I/O Address Register (I/O AR): Contains the address of a I/O.
6. I/O Buffer Register (I/O BR): It contains a word of data to be written to
I/O device.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 97
Any instruction to be executed must be present in the System Memory. The
instruction is read from a location pointed by PC of the memory, and then
transferred it to IR through the data bus. The instruction is decoded and
then the data is brought to the ALU either from memory or register etc. Then
ALU computes the required operation on the data and stores the result in a
special register called Accumulator. All the sequence of actions is controlled
by the control signals generated by the control unit. Thus Accumulator is a
special purpose register designated to hold the result of an operation
performed by the ALU.
3.8 Summary
This unit begins with providing a historical perspective of computers. It, then
provides sufficient discussion on classification of computers, various
components of the computers, followed by memory which is considered to
be the crucial component. Finally this unit concludes with a brief discussion
on ALU and Bus Architecture.
Self Assessment Questions:
1. ____________ are used for business data processing, when computing
and storage capacity are larger than what the minicomputers can handle.
2. ____________ refers to those attributes of a computer system which are
visible to a programmer.
3. A ____________ is an entity that interacts in some or the other way with
its external environment.
4. Bus can also be a wire or a communication line or in general it can be
referred to as a ____________.
5. ____________ performs the calculations on the input data.
Computer Organization and Architecture Unit 3
Sikkim Manipal University - DDE Page No. 98
3.9 Terminal Questions
1. Explain the functional units of a basic computer with a neat diagram
2. Explain Von Neumann Architecture.
3. Explain the functions of ALU.
4. Explain the System Bus structure.
5. Give a brief account of early days of computers.
6. Discuss on various types of memories.
3.10 Answers
Self Assessment Questions:
1. mainframes
2. architecture
3. computer
4. system interconnection
5. arithmetic logic unit (ALU)
Terminal Questions:
1. Refer Section 1.3
2. Refer Section 1.7
3. Refer Section 1.4
4. Refer Section 1.6
5. Refer Section 1.2
6. Refer Section 1.3.2
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 99
Unit 4 CPU and Register Organization
Structure:
4.1 Introduction
Objectives
4.2 Registers
User-Visible Registers
Control and Status Registers
Program Status Word (PSW)
4.3 CPU Organization
Fundamental Computer Architecture
CPU organization in 8085 microprocessor
4.4 Register Organization of different machine
The Zilog Z8000 machine
Intel 8086 machine
Motorola 68000 machine
4.5 Instruction cycles
Basic instruction cycle
Basic instruction Cycle state diagram
4.6 Summary
4.7 Terminal Questions
4.8 Answers
4.1 Introduction
As discussed in the previous unit a CPU consists of a set of registers that
function as a level of memory above Main memory and Cache memory. The
registers in the CPU are of two types:
User-visible Registers: These enable the machine or assembly
language programmer to minimize main memory references by use of
registers.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 100
Control and Status Registers: These are used by the control unit to
control the operation of the CPU and by privileged operating system
programs to control the execution of programs.
Note: Sometimes there is no clear separation of registers into these
categories. For example, on some of the machines the register called
Program Counter (PC) is user visible, that is, user can see the contents of
PC, but on some machines it is invisible to the user.
Objectives:
By the end of Unit 4, the learners should be able to:
1. Explain the register organization of any basic computer.
2. Explain status word, program counter registers.
3. Discuss the register organization of Intel 8085 microprocessor and
compare with other machines.
4. Explain the different phases of basic instruction cycle.
4.2 Registers
User-Visible Registers
User visible registers are those registers that are used by the user in the
machine language and can be executed by the CPU. All CPU designs
provide a number of user visible registers. These registers can be
categorized into following different types:
General Purpose Registers: They can be assigned a variety of
functions by the programmer. The general purpose registers can be
considered for orthogonal usage and non-orthogonal usage.
o If any general purpose register can contain the operand for any
opcode then we refer the use of general purpose as orthogonal
usage.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 101
o Sometimes the use of general purpose register is restricted. For
example, there may be dedicated registers that are used for floating
point operations. Then we refer to the use of general purpose as non
orthogonal usage.
Data registers: They can only be used to hold data, and cannot be
employed in the calculation of an operand address.
Address registers: They may be used either in general purpose
addressing modes, or may be devoted to a particular addressing mode.
o Segment Pointers: In a machine CPU with segmented addressing,
a register holds the address of the base of a segment. There may be
multiple segment registers. For example, one for the operating
system, one for the current process.
For example, Intel 8086 family has different segments and they are referred
to as
1) code segment: where the code is written
2) data segment: where the data is placed
3) stack segment: acts like a stack
4) extra segment
Thus there may be many registers one for operating system and one for the
current process.
o Index Registers: These are used for indexed addressing, and may be
auto indexed.
o Stack Pointer: If there is a user-visible stack addressing, then
typically the stack is in memory and there is a dedicated register that
points to the top of the stack. This allows implicit addressing, that is
push, pop and other stack instructions need not contain an explicit
stack operand.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 102
Condition codes: These are least partially visible register types. They
are also referred to as flags. The bits of Flag register are set according
to the result of an operation. There can be Zero flag, Carry flag, etc., that
indicate whether the result is zero or the result produced an overflow,
respectively.
Example: Consider the length of all registers to be two bits long. The result
of adding two binary numbers 10 and 11 will be 101. That is
1) The result will be stored in a destination register which will be 01.
2) And the carry is set as there is overflow after execution.
Control and Status Registers
These registers are employed to control the operation of the CPU.
Four registers are essential to instruction execution:
Program Counter (PC): It contains the address of an instruction to be
fetched.
Instruction Register (IR): It contains the instruction most recently
fetched.
Memory Address Register (MAR): It contains the address of a location
in memory.
Memory Buffer Register (MBR): It contains a word of data to be written
to memory or the word most recently read.
These four registers are used for the movement of data between the CPU
and memory.
Within the CPU, data must be presented to the ALU for processing. The
ALU may have direct access to the MBR and user-visible registers.
Alternatively, there may be additional buffering registers at the boundary to
the ALU; and these registers serve as input and output registers for the ALU
and exchange data with MBR and user-visible registers. This depends on
the design of the CPU and ALU.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 103
Program Status Word (PSW):
It contains status information. PSW typically contains condition codes plus
other status information. Some of these may be user visible. Common flags
include:
Sign: contains the sign bit of the result of the last arithmetic operation.
Zero: set when the result is 0.
Carry: set if an operation resulted in a carry (addition) into or borrow
(subtraction) out of the high-order bit.
Equal: set if logical compare result is equality.
Overflow: used to indicate arithmetic overflow.
Interrupt enable/disable: used to enable or disable interrupts.
Supervisor: Indicates whether the CPU is executing in supervisor mode
or user mode. Certain privilege instructions can be executed only in
supervisor mode (e.g. halt instruction), and certain areas of memory can
be accessed only in supervisor mode.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 104
4.3 CPU organization
Figure 4.1: CPU with System Bus
As discussed earlier CPU which is the heart of a computer consists of
Registers, Control Unit and Arithmetic Logic Unit. The interconnection of
these units is achieved through the system bus as shown in figure 4.1.
The following tasks are to be performed by the CPU:
1. Fetch instructions: The CPU must read instructions from the memory.
2. Interpret instructions: The instructions must be decoded to determine
what action is required.
3. Fetch data: The execution of an instruction may require reading data
from memory or an I/O module.
4. Process data: The execution of an instruction may require performing
some arithmetic or logical operations on data.
5. Write data: The results of an execution may require writing data to the
memory or an I/O module.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 105
Fundamental Computer Architectures
Here we describe the most common Computer Architectures, all of which
use stored program control concept.
The three most popular computer architectures are:
1) The Stack Machine
2) The Accumulator Machine
3) The Load / Store Machine
The Stack Machine
A stack machine implements a stack with registers. The operands of the
ALU are always the top two registers of the stack and the result from the
ALU is stored in the top register of the stack. Examples of the stack machine
include Hewlett-Packard RPN calculators and the Java Virtual Machine
(JVM).
Figure 4.2: The Stack Machine Architecture
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 106
The advantage of a stack machine is that it can shorten the length of
instructions since operands are implicit. This was important when memory
was expensive (20–30 years ago). Now, in Java, it is important since we
want to ship executables (class files) over the network.
The Accumulator Machine
An accumulator machine has a special register, called an accumulator,
whose contents are combined with another operand as input to the ALU,
with the result of the operation replacing the contents of the accumulator.
accumulator = accumulator [op] operand;
Figure 4.3: The Accumulator Machine Architecture
In fact, many machines have more than one accumulator
Pentium: 1, 2, 4 or 6 (depending on how you count)
MC68000: 16
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 107
In order to add two numbers in memory,
1) Place one of the numbers into the accumulator (load operand)
2) Execute the add instruction
3) Store the contents of the accumulator back into memory (store operand)
The Load/Store Machine
Registers: Provide faster access but are expensive.
Memory: Provides slower access but is less expensive.
A small amount of high speed memory (expensive), called a register file, is
provided for frequently accessed variables and a much larger but slower
memory (less expensive) is provided for the rest of the program and data.
(SPARC: 32 register at any one time)
This is based on the principle of “locality" – at a given time, a program
typically accesses a small number of variables much more frequently than
others.
Figure 4.4: The Load / Store Machine Architecture
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 108
The machine loads and stores the registers from memory. The arithmetic
and logic instructions operate with registers, not main memory, for the
location of operands.
Since the machine addresses only a small number of registers, the
instruction field to refer to a register (operand) is short; therefore, these
machines frequently have instructions with three operands:
ADD src1, src2, dest
EXAMPLE MACHINE INSTRUCTIONS y = y+10
y &y
[y] &y = & y = y
Stack Machine Accumulator Machine Load / Store Machine
Push [y] Load [y] Load r0, [y]
Push 10 Add 10 Load r1, 10
Add Store y’ Add r0, r1, r2
Pop y’ Store r2, y’
CPU organization of 8085 microprocessor
In the following sections we will take a look at the CPU organization in Intel
8085. We will study the register organization and their relationship with other
components in a relatively simple Intel 8085.The Intel 8085 CPU
organization is as shown in figure 4.5.
The CPU of 8085 is organized around a single 8-bit internal bus. Connected
to the Bus are the following components.
1. Accumulator: Used as a temporary buffer to store input to the ALU. It is
also user visible and is addressed in some instructions.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 109
2. Temporary register: Used as input for ALU this is other than
Accumulator.
3. Flags: These are nothing but condition codes. The result of ALU
operations are related back in the register as flags.
Figure 4.5: Intel 8085 CPU organization
The different flag bits supported by the 8085 CPU are:
1. Z: It stands for Zero Flag. This bit is set when the result of ALU
operation is zero. That is result = zero implies Z=1 and result = non zero
implies Z=0.
2. S: It stands for Sign Flag. This bit is set when the result of ALU
operation is negative
That is result = negative implies S=1 and result = non negative implies
S=0
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 110
3. P: It stands for Parity Flag. This bit is set when the result of ALU
operation results in even parity. That is the result when expressed in
binary has even number of ones in it.
That is result contains even number of ones implies even parity P=1 and
result contains odd number of ones implies odd parity P=0
4. CY: It stands for Carry Flag. This bit is set when the result of ALU
operation results in overflow.
That is result contains overflow implies carry flag (CY) =1 and result
does not contain overflow implies CY=0
5. AC: It stands for Auxiliary Carry Flag. This bit is set when the result of
ALU operation results in carry between fourth and fifth bit.
That is result contains carry from fourth to fifth bit implies AC = 1 and
result contains carry from fourth to fifth bit implies AC =0
4. ALU output: The result of an operation is placed on the bus.
5. Instruction register: It is loaded from the memory buffer register
(MBR) through the Bus. The instruction is brought into CPU through
the address/data buffer and internal CPU bus to the instruction
register. Then it is decoded here and executed by the control unit.
6. Register Array: It is a collection of registers that are user visible and
help to store the operands or result of the operation.
The register array contains five 16-bit registers. The B – C, D – E, H – L
register, Stack pointer and program register.
1. The first three types of registers can be treated as a single 16 bit register
each or two 8-bit registers each. That is B, C, D, E, H, L are six registers
which are 8-bit long.
2. Stack pointer is a 16-bit that points to the location of stack memory. This
is used for stack oriented machines.
3. Program counter is 16 bit that points to the current location of the
memory where the instructions are to be written. It is moved to the MAR
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 111
(address plus address/data buffers) to begin the fetching of the next
instruction.
Any of these registers may be loaded, 8-bits at a time, from the internal bus
or transferred to the address buffer or address/data buffer. Also any of the
16-bit registers can be transferred to the MAR that connects to the external
address bus.
7. Address/data Buffer: This buffer connects to multiplexed bus lines.
Some of the time, this buffer acts as a memory buffer register (MBR) to
exchange data with the system bus. At other times it acts as the low
order 8-bits of a memory address register (MAR). This multiplexing
allows additional pins that are available for control signals to the system
bus.
8. Address buffer: Used as a high order 8 bits of the MAR.
Let us consider an instruction ADD B as an example to study the sequence
of actions that takes place.
This instruction causes the contents of register B to be added to the
contents of the accumulator.
To execute this instruction, the control unit moves the contents of
register B to the temporary register.
The ALU performs the add operation on its inputs.
The result is placed on the internal CPU bus and copied into the
accumulator.
The ALU also updates the flags to reflect the results of add.
4.4 Register Organization of different machines
It is very much instructive to examine and compare the register organization
of comparable systems. In this section we will discuss the register
organization of 16-bit microprocessors that were developed more or less at
the same time.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 112
The Zilog Z8000
Figure 4.6 depicts the register organization of Z8000 machine. Here only
purely internal registers structure is given and memory address registers are
not shown.
Figure 4.6: Zilog Z8000 register organization
Z8000 consists of sixteen 16-bit general purpose registers, which can be
used for data, address and indexing. The designers of this machines felt
that it was useful to provide a regularized, general set of registers than to
save instruction bits by using special purpose registers. Further the way the
functions are assigned to these registers is the responsibility of the
programmer. There might be different functional breakdown for different
applications.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 113
A segmented address space uses 7-bit segment number and a 16-bit offset.
It uses two registers to hold a single address. There are two other registers
called stack pointers that are necessary for stack module. One register used
for system mode and one for normal mode.
Z8000 consists of five other registers that are related to program status.
Two registers hold the program counter and two registers hold the address
of a program status area in the memory. A 16-bit flag register called Flag
control word holds various flags status and control bits.
Intel 8086
The approach taken by Intel for register organization is different than Z8000
machine and is as shown in Figure 4.7.
Figure 4.7: Intel 8086 register organization
In this machine every register is a special purpose register. There are some
registers that also serve as general purpose registers. The 8086 machine
contains four 16-bit data registers that are accessible on a byte or 16-bit
basis. It also has four 16-bit pointers and index registers. The data registers
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 114
can be used as general purpose registers in some instructions. In others
registers are used implicitly.
Example: A multiply instruction always uses accumulator. The four pointer
registers each with segment offset are also used in a number of operations
implicitly. There are also four segment registers. Three of these segment
registers are used in a dedicated, implicit way to point to the segment of the
current instruction, a segment containing data and a segment containing a
stack. This type of structure is useful in branch operations. The dedicated
and implicit uses provide for compact encoding at the cost of reduced
flexibility. The 8086 also includes an instruction pointer and a set of 1 bit
status and control flags.
The Motorola MC68000
DATA REGISTERS ADDRESS REGISTERS
D0 A0
D1 A1
D2 A2
D3 A3
D4 A4
D5 A5
D6 A6
D7 A7
PROGRAM STATUS
Program counter
Status register
Figure 4.8: MC68000 register organization
This machine uses a structure that falls between the Zilog Z8000 and Intel
8086. The register organization of MC68000 is as shown in figure 4.8
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 115
The MC6800 machine partitions the 32-bit registers into eight data registers
and nine address registers. The data registers are used in data
manipulations and are also used in addressing only as index registers. The
width of data register allows 8-bit, 16-bit and 32-bit data operations
depending upon the opcode.
The address registers contain a 32-bit address. It does not support
segmentation. Two of the address registers are used as stack pointers: One
for the users and one for the operating system, depending upon the current
execution mode. Both the stack pointers are numbered as seven i.e., A7 as
only one can be used at a time. The MC68000 also has program counter
and status register as in the other two machines. The program counter is
32-bit register and status register is 16-bit. Like Zilog the Motorola team also
supports a regular instruction set with no special purpose registers. For the
purpose of code efficiency they divided the registers into functional
components, saving one bit at each registers.
4.5 Instruction Cycles
The basic function performed by a computer is program execution. The
program to be executed consists of a set of instructions stored in memory.
The CPU does the actual work by executing the instructions written in the
program.
Basic Instruction cycle
To understand the ways the major components of a computer or rather CPU
interact to execute a program, consider the flow chart as shown in figure 4.9.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 116
START
HALT
Fetch the NEXT
Instruction
Execute
Instruction
Fetch cycle
Execute cycle
Figure 4.9: Basic instruction cycle
The processing required for a single instruction is called an instruction
cycle. Referring to the above figure 4.9, the basic instruction cycle can be
assumed to consist of two steps:
1. The CPU reads or fetches instructions from memory one at a time.
2. Then this single instruction is executed.
The program execution consists of repeating these two steps of instruction
fetch and instruction execution until the end of the program.
Thus the instruction cycles consists of fetch cycle and execute cycle.
The execution of program itself may involve a number of steps. The
instruction fetch is a common operation for every instruction, and consists of
reading an instruction from a location in memory. The instruction may
involve several operations depending upon the nature of the instructions. In
any CPU we have seen that there is a register called program counter that
keeps track of the instructions to be fetched next from the memory. CPU
always increments the PC after each instruction fetch so that it will fetch the
next instructions in sequence.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 117
The fetched instruction is stored in a register in the CPU known as
instruction register. The instruction is in the form of binary code that
specifies what action has to be taken by the CPU. The CPU interprets the
instruction and performs the required action.
Example: If PC = [300] that is PC is set to 300 location of memory. If there
are three instructions in a program and next subsequent instructions are
placed say 303, 304 and end or Halt at 306. Then discuss the steps
involved in execution of this program.
Solution:
1. First of all the instruction that is present at location 300 in memory is
loaded into IR .At the same time PC is incremented by 3 as the next
instruction is at 303.
2. Depending on the instruction that is decoded, it finds that operands are
to be read from memory (as the next instruction is at 303 implies that
the operands of first instruction may be placed at 301 and 302)
3. Then the instruction is executed.
4. Then again reads the instruction from 303 into IR and update PC=304.
5. Decodes the instruction and Executes. (Now operands are definitely
not specified in memory as the next instruction is at 304.)
6. The third instruction is read into IR from location 304 as PC=304 and
now PC updates with 306.
7. Decodes the instruction and may find operand in memory location
hence gets operands from memory at 305.
8. and then executes the instruction
9. Finally now the next instruction is at 306 so it is read in IR and PC=307
10. Decodes this instruction which is halt
11. Executes it which terminates the execution of the program.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 118
The following is one view of the instruction cycles:
Fetch the instruction:
Place it in a holding area called IR.
Update the program counter (PC).
Execute the instruction:
Decode the instruction:
o Perform address translation.
o Fetch operands from memory.
Perform the required calculation.
Store results in registers or memory.
Set status flags attached to the CPU.
The required action performed by the CPU falls into four categories.
1. CPU-Memory: Data may be transferred from the CPU to memory or
from memory to CPU
2. CPU-I/O: Data may be transferred to/from the outside world by
transferring between CPU and I/O module.
3. Data processing: The CPU may perform some arithmetic or logic
operations on data.
4. Control: An instruction may specify the sequence of execution be
altered. For example, jump instruction.
Consider for example that the program is written from location 300 to 320 in
the memory. Now if the instruction present at say 305 is something like jump
to 310. When the program is executed the instruction located at 300 to 305
are executed in sequence one at a time. Then after the execution of
instruction jump to 310, the CPU has updated the PC=310. Hence the next
instruction that is fetched will be from 310 and so on.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 119
Basic Instruction Cycle State diagram
These are the detailed states occurring during the fetch and execution
cycles are as shown in figure 4.10. A few of them occur only once and a few
occur multiple times.
Figure 4.10: Instruction Cycle State Diagram
The different possible states occurring in Instruction cycle are:
Instruction Address Calculation (IAC): Determine the address of the
next instruction to be executed. Usually, this involves adding a fixed
number to the address of the previous instruction. For example, if each
instruction is 16 bits long and memory is organized into 16-bits words,
then add 1 to the previous address. If, instead memory is organized as
individually addressable 8-bit bytes, then add 2 to the previous address.
Instruction Fetch (IF): Read instruction from its memory location into
the processor.
Instruction Operation Decoding (IOD): Analyze instruction to
determine type of operation to be performed and operand(s) to be used.
Operand Address Calculation (OAC): If the operation involves
reference to an operand in memory or available via I/O, then determine
the address of the operand.
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 120
Operand Fetch (OF): Fetch the operand from memory or read it in, from
I/O.
Data Operation (DO): Perform the operation indicated in the instruction.
Operand Store (OS): Write the result into memory or out to I/O.
During the instruction cycles, Instruction address calculation, Instruction
fetch and Instruction operation decoding are performed only once, but
operand address calculation and operand fetch may happen multiple times.
Data operation is performed once, and Operand store with the combination
of Operand address calculation may be performed zero or more times
depending on the instruction.
4.6 Summary
We know that CPU consists of a set of registers that function as a level of
memory. Here we have discussed the different types of registers like user
visible registers, control registers or flag registers and so on. We have also
introduced the basic instruction cycle so as to get a feel of how exactly
these registers are used in the execution of the instruction. As a case study
we have seen the detail register organization of few machines like Zilog
Z8000, Intel 8086 and Motorola 68000 microprocessors.
Self Assessment Questions
1. CPU consists of a set of registers that function as a level of memory
above ______________.
2. ______________ enable the machine or assembly language
programmer to minimize main memory references by use of registers.
3. Flags of 8085 are nothing but ______________.
4. ______________ number of flag bits is defined in 8085.
5. The processing required for a single instruction is called an
______________ .
Computer Organization and Architecture Unit 4
Sikkim Manipal University - DDE Page No. 121
4.7 Terminal Questions
1. List and explain the various condition flags of the Intel 8085
microprocessor.
2. Explain the register organization of 8086.
3. Compare the register organization of 8085, Z8000 and MC68000.
4.8 Answers
Self Assessment Questions:
1. main memory and cache memory
2. user visible registers
3. conditional codes
4. five
5. instruction cycle
Terminal Questions:
1. Refer Section 4.3
2. Refer Section 4.4
3. Refer Section 4.4
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 122
Unit 5 Interconnection Structures
Structure:
5.1 Introduction
Objectives
5.2 Types of exchange of information
Modules of a System
Different types of transfers
5.3 Types of Buses
5.4 Elements of Bus Design
Bus Types
Method of arbitration
Bus Timing
Bus width
Bus Speed
5.5 Bus Structure
Single Bus System
Two Bus Organization
The Bus Standard
5.6 Summary
5.7 Terminal Questions
5.8 Answers
5.1 Introduction
Different types of exchanges are required for communication in a computer.
Figure 5.1 shows the Memory Module, I/O Module and CPU Module and
also indicates the major forms of inputs and outputs of these modules.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 123
Objectives:
By the end of Unit 5, you should be able to:
1. Discuss the different means of data transfer.
2. Define BUS, and explain its structure.
3. Explain the elements of Bus design.
4. Explain with neat sketches single and two bus structure.
5.2 Types of exchange of information
A computer consists of three basic types of modules (processor, memory,
I/O) that communicate with each other. Thus, there must be paths for
connecting the modules. The collection of paths is called the
"interconnection structure". The design of this structure will depend on
the exchanges that must be made between modules.
Modules of a system
Figure 5.1: a) Memory module
Memory module consists of N words of equal length. Each word is assigned
a unique numerical value (0, 1, . . ., N-1). Figure 5.1(a) shows the possible
inputs and output of a memory module. A word of data can be read from or
written into the memory depending on whether the control signal is Memory
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 124
Read (MR) or Memory Write (MW). The location of the memory is provided
by the input called as Address.
Example: List the inputs and outputs that are necessary to write the data to
the memory.
Solution:
The inputs that are required are
1. Control signal which is memory write,
2. Address that gives the location of the memory where the data is to be
written and the data itself.
There is no output required for this particular task.
The sequence of operations that take place:
1. The position of the memory is located using the input address.
2. Then it sees the input control signal is MW and then the input which is
data is placed in the location of the memory.
I/O Module
I/O is functionally similar to memory. The possible inputs and outputs for a
typical I/O module is as shown in figure 5.1(b).
Figure 5.1: b) I/O Module
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 125
There are two operations read and write. I/O module may control more than
one external device. We usually refer to each of the interface of the external
device as a port. Each port is given a unique address as 0, 1, ………, M-1.
Also there are external data paths for the input and output of data with an
external device. Also an I/O module may be able to send interrupt signals to
the CPU.
CPU Module
The possible inputs and outputs for a typical CPU module is as shown in
figure 5.1(c). The CPU reads in the instructions and data and writes out the
data after processing. It uses control signals to control the overall operation
of the system. It also receives the interrupt signals and appropriate actions
to be taken in outputs data as well as control signals.
Figure 5.1: c) CPU Module
Different types of transfers
There are two types of transfers that are classified; one depending on the
devices that are to be interconnected and the other depending on whether it
carries information one bit or several bits.
Types of transfers between different modules
Following are the possible ways of transfer of information between the three
modules of the system:
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 126
Memory to Processor (CPU): The processor reads an instruction or a
unit of data from memory.
CPU to Memory: The processor writes a unit of data to memory.
I/O to CPU: The processor reads data from an I/O device via an I/O
module.
CPU to I/O: The processor sends data to the I/O device.
I/O to or from memory: For these two cases, an I/O module is allowed
to exchange data directly with memory, without going through the
processor, using Direct Memory Access (DMA).
Serial & Parallel transfer
A bus consists of multiple communication lines (or pathways). Each line is
capable of transmitting signals representing binary 1 or binary 0. Bits are
transmitted in Serial & Parallel. When only one bit is carried at a time it
requires only a single wire as shown in figure 5.2. When we consider many
bits to be transmitted at a time it requires many wires as shown in figure 5.3.
These many wires together constitute a bus and the mode of transmission is
termed as parallel transmission.
One device Other Device
One bit
Figure 5.2: Serial transmission - one bit at a time
7D
0D
Transmitter
Receiver
Clock
Figure 5.3: Parallel transmission - several (8 bits here) bits at a time
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 127
Table gives the Comparison of serial & parallel transmission
Factor Serial mode Parallel mode
Cost Less costly (only one wire) More costly (many wires)
Speed Low ( only 1 bit at a time) High (more bits at a time)
Throughput Low High
5.3 Types of Buses
As discussed earlier in unit 1, a system bus consists of a data bus, a
memory address bus and a control bus. The interconnection between the
modules of a system is as shown in figure 5.4
Figure 5.4: Bus Interconnection scheme
Data Bus: A bus, which carries a word to or from memory, is called data
bus. Its width is equal to the word length of the memory. Also, it provides a
means for moving data between the different modules of a system. The data
bus usually consists of 8, 16 or 32 separate lines. The number of lines
implies the data bus
Address bus: A bus that is used to carry the address of the data in the
memory and its width is equal to the number of bits in the Memory Address
Register (MAR) of the memory.
Example: If a computer memory has 64K, 32-bit words, then the data bus
will be 32-bits wide and the address bus will be 16-bits wide.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 128
Control bus: A bus that is used to control the access carries the control
signals between the various units of the computer. The processor has to
send commands READ and WRITE to the memory which requires single
wire. A START command is necessary for the I/O units. All these signals are
carried by the control bus.
Types of Control Lines
Memory Write: Causes data on the bus (data bus) to be written into the
addressed location.
Memory Read: Causes data from the addressed location to be placed
on the bus (data bus).
I/O Write: Causes data on the data bus to be output to the addressed
I/O port.
I/O Read: Causes data from the addressed I/O port to be placed on the
bus (data bus).
Transfer ACK: Indicates that data has been accepted from or placed on
the bus.
Bus Request: Indicates that a module needs to gain control of the bus.
Bus Grant: Indicates that a requesting module has been granted control
of the bus.
Interrupt Request: Indicates that an interrupt is pending.
Interrupt ACK: Acknowledges that the pending interrupt has been
recognized.
Clock: Used to synchronize operations.
Reset: Initializes all modules.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 129
5.4 Elements of Bus Design
Bus Types: Bus lines can be separated into two types.
Dedicated: Permanently assigned to either one function or to a
physical subset of components.
Functional dedication: Bus has a specific function.
Example: Three busses identified for carrying address, data, and
control signals as seen earlier. They are Address Bus, Data Bus,
and Control Bus.
Physical dedication: Refers to the use of multiple buses, each of
which connects only a subset of components using the bus.
Example: I/O buses are used only to interconnect all I/O modules.
And this bus is then connected to the main bus through some type of
an I/O adapter module.
Advantage of Physical dedication:
It offers high throughput because there is less bus contention.
Disadvantage of Physical dedication:
Increased size and cost of the system
Multiplexed: These are also referred to as non-dedicated. Same bus
may be used for various functions. The method of using the same bus
for multiple purposes is known as Time Multiplexing.
Example: Discuss the steps of actions that are to be performed so that
address and data information may be transmitted over the same set of lines.
Solution: Assume an additional control signal called address line activation
line or Address Line Enable (ALE) line is used.
List of operations are as follows:
1. At the beginning of the data transfer the address is placed on the bus
with the ALE line activated.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 130
2. Each module is given sufficient period of time to copy the address and
determine if it is the addressed module.
3. The address is then removed from the bus, and then same bus
connections are used for subsequent read and write data transfer with
the ALE signal deactivated.
Advantages: Multiplexing uses fewer lines, which saves space and cost.
Disadvantages of multiplexing:
1. More complex circuitry required in each module.
2. There is potential reduction in the performance as certain events that
share the same bus cannot take place in parallel.
Method of Arbitration:
Centralized: A single hardware device (referred to as bus controller or
arbiter) is responsible for allocating time on the bus to various
components.
Distributed: In a distributed scheme, there is no central controller.
Each module contains access control logic and the modules act together
to share the bus.
In both methods, the purpose is to designate one device (either the
processor or an I/O module) as master. The master may then initiate a data
transfer with some other device which acts as slave for this particular
exchange.
Bus Timing
Synchronous: The occurrence of events on the bus is determined by a
clock. The bus includes a clock line. A single 1-0 transmission on clock
signal is referred to as 1 clock cycle" or "bus cycle", and defines a time
slot. Most events occupy a single clock cycle, but some requires more
cycles.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 131
Asynchronous: The occurrence of one event on a bus follows and
depends on the occurrence of a previous event.
Synchronous timing is simpler to implement and test; however it is less
flexible.
With asynchronous timing, a mixture of slow and fast devices can share
a bus.
Bus Width:
Bus Width of Address Lines: Number of memory units that can be
addressed.
Bus Width of Data Lines: Size of memory units that can be addressed.
(8, 12, 16, 32, 64 bits)
Ask and give examples: How many memory addresses with 24 bits,
32 bits, etc.
Bus Speed:
One of the important attribute of busses is its speed. The speed of the bus
refers to how fast you can change the data on the bus, and still have
devices to be able to read the values correctly. Bus speed can limit how
fast a CPU can communicate with memory. The size of a bus can also limit
the speed too.
Example: The speed can be measured in say, MHz that is up to 106
changes per second.
Data Transfer Type:
Read: (Slave to Master)
Write: (Master to Slave)
Read-Modify-Write: A read followed immediately by a write to the same
address. Usually indivisible operation to prevent any access to data by
other potential bus masters.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 132
Read-After-Write: Indivisible operation consisting of a write followed
immediately by a read of the same address. Generally for checking
purposes.
Block: In this case, one address cycle is followed by n data cycles. The
first data item is transferred to/from the specified address, the remaining
data items are transferred to/from subsequent addresses.
5.5 Bus Structure
If a large number of devices are connected to the bus, the performance will
suffer. There are two main causes:
1. In general, the more devices attached to the bus, the greater is the bus
length, and hence the greater is the propagation delay.
2. The bus may become a bottleneck as the aggregate data transfer
demand approaches the capacity of the bus.
To overcome these problems, in most of the modern computers there
are multiple buses.
Single Bus System
In this type of inter-connection, the three units share a single bus. Hence the
information can be transferred only between two units at a time. Here the
I/O units use the same memory address space. This simplifies programming
of I/O units as no special I/O instructions are needed. This is one of
advantages of single bus organization.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 133
I/O
Units
Processor
Memory
Figure 5.5: Single-Bus Organization
The transfer of information over a bus cannot be done at a speed
comparable to the operating speed of all the devices connected to the bus.
Some electromechanical devices such as keyboards and printers are very
slow whereas disks and tapes are considerably faster. Main memory and
processors operate at electronic speeds. Since all the devices must
communicate over the bus, it is necessary to smooth out the differences in
timings among all the devices.
A common approach is to include buffer register with the devices to hold
the information during transfers. To illustrate this let us take one example.
Consider the transfer of an encoded character from the processor to a
character printer where it is to be printed. The processor sends the
character to the printer output register over the bus. Since buffer is an
electronic register this transfer requires relatively little time. Now the printer
starts printing. At this time bus and the processor are no longer needed and
can be released for other activities. Buffer register is not available for other
transfers until the process is completed. Thus buffer register smoothes out
the timing differences between the processor, memory and I/O devices. This
allows the processor to switch rapidly from one device to another
interweaving its processing activity with data transfer involving several I/O
devices.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 134
Two Bus Organization
Figure 5.6 shows inter-connection of various computer units through two
independent system buses. Here the I/O units are connected to the
processor through an I/O bus and the processor is connected to the memory
through the memory bus.
DAB MAB
CB CB
Figure 5.6: Two Bus Organization
The I/O bus consists of device address bus, data bus and a control bus.
Device address bus carries the address of the I/O units to be accessed by
the processor. The data bus carries a word from the addressed input unit to
the processor and from the processor to the addressed output unit. The
control bus carries control commands such as START, STOP etc., from the
processor to I/O units and it also carries status information of I/O units to the
processor. Memory bus also consists of a Memory Address Bus (MAB),
data bus and a control bus.
In this type of inter-connection the processor completely supervises the
transfer of information to and from the I/O units. All the information is first
taken to the processor and from there to the memory. Such a data transfer
is called as program controlled transfer.
An alternative two bus structure:
There is one more method to connect processor to the I/O units. Figure 5.7
shows an alternative two bus structure. Here the I/O units are directly
connected to the memory.
I/O Units Processor Memory
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 135
I/O devices
Processor
Memory
Peripheral
Processing Unit
Figure 5.7: Alternative Two-Bus Organization
Here the I/O devices are connected to special interface logic known as
Direct Memory Access (DMA) logic or an I/O channel. This is also called
as Peripheral Processor Unit (PPU). The processor issues a READ or
WRITE command giving the device address, the address of the memory
location where the data read from the input unit is to be stored or from
where the data is to be taken to output units, and the number of data words
to be transferred. This command is accepted by the PPU, which now takes
the responsibility of data transfer.
The Bus Standard
In 1984 IBM was shipping its PC AT model. The CPU, memory and I/O bus
all shared a common 8MHz clock. This became the basis for all subsequent
clone computers. The term “AT” is a registered trademark of IBM, so this I/O
bus became known as the ISA (Industry Standard Architecture) bus.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 136
Every currently marketed PC supports some ISA interface slots. The bus
and matching adapter cards are simple and cheap. ISA is a 16-bit interface,
which means that data can be transferred only two bytes at a time. More
importantly, the ISA bus runs at only 8 MHz and it typically requires two or
three clock ticks to transfer those two bytes of data. This is not a problem for
devices that are inherently slow like the COM port (modem), the printer port,
the sound card or the CD-ROM. However, the ISA bus is too slow for high
performance disk access and therefore is not acceptable in Servers. It is
also too slow for modern Windows display adapters.
The PCI (Peripheral Connection Interface) bus was developed by Intel.
Although it is mostly known for its CPUs, Intel also has a historical
association with Ethernet, multimedia and some disk interfaces. So Intel
was unhappy with the VLB concentration on just the video interface and
wanted to develop a general purpose bus. The objective was an interface
that was fast and inexpensive. It did not have to be simple (advances in chip
technology took care of that) and could achieve a low cost by high volume
production.
PCI is a 64-bit interface in a 32-bit package. Figuring this out requires a bit
of arithmetic. The PCI bus runs at 33 MHz and can transfer 32-bits of data
(four bytes) every clock tick. That sounds like a 32-bit bus. However, a clock
tick at 33 MHz is 30 nanoseconds and memory only has a speed of 70
nanoseconds. When the CPU fetches data from RAM, it has to wait at least
three clock ticks for the data. By transferring data every clock tick, the PCI
bus can deliver the same throughout on a 32-bit interface that other parts of
the machine deliver through a 64-bit path.
The PCI bus connects at one end to the CPU/memory bus and at the other
end to a more traditional I/O bus. The PCI interface chip may support the
video adapter, the EIDE disk controller chip and may be two external
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 137
adapter cards. A desktop machine will have only one PCI chip and so it will
add a number of extra ISA only slots. A server may add additional PCI chips
and extra server slots will usually be EISA.
While ISA and EISA are exclusively PC interfaces, the PCI bus is now used
in Power Macintosh systems and PowerPC machines. It may be attractive
for minicomputers and other RISC workstations.
5.6 Summary
There must be paths for connecting the processor, Memory or I/O modules.
The collection of paths is called the interconnection structure or the Bus
structure. We have introduced different types of transfers between different
modules like Memory to processor (CPU), CPU to memory, or I/O to CPU.
There are different buses called address, data or control bus depending
upon information they carry, that is, address, data, or controlling signals
respectively. These buses can be dedicated to one type of information they
carry or can be multiplexed; they can be centralized or distributed, or
synchronous or asynchronous.
Self Assessment Questions
1. The collection of paths for connecting the modules is called the
____________.
2. The location of the memory is provided by the input called as
____________.
3. A bus which carries a word to or from memory is called ____________.
4. A bus that is used to carry control signals is ____________.
5. The method of using the same bus for multiple purposes is known as
____________.
Computer Organization and Architecture Unit 5
Sikkim Manipal University - DDE Page No. 138
5.7 Terminal Questions
1. Discuss the different types of Bus.
2. Give the advantages and disadvantages of physical and functional
buses.
3. Explain the Single bus structure.
4. Discuss the relative merits and de-merits of single, two and three bus
structure.
5. Discuss the methods of arbitration.
5.8 Answers
Self Assessment Questions:
1. interconnection structure
2. address
3. data bus
4. control bus
5. time multiplexing
Terminal Questions:
1. Refer Section 5.2
2. Refer Section 5.3
3. Refer Section 5.4
4. Refer Section 5.4
5. Refer Section 5.3
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 139
Unit 6 Instruction Sets:
Addressing Modes and Formats
Structure:
6.1 Introduction
Objectives
Instruction Characteristics
Instruction representation
Instruction types
Number of addresses
Instruction Set Design
6.2 Types of Operands
Data types
IBM 370 Data types
VAX Data types
6.3 Types of Operations
Data transfer
Arithmetic
Logical
Conversion
I/O, system control
Transfer of control
System Control
6.4 Addressing Modes
Direct addressing mode
Immediate addressing mode
Indirect addressing mode
Register addressing mode
Register indirect addressing mode
Displacement addressing mode
Relative addressing mode
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 140
Base Register addressing Mode
Indexing
Stack addressing
Other additional addressing modes
6.5 Instruction formats
Instruction Length
Allocation of bits
Variable length instruction
6.6 Stacks & Subroutines
Stacks
Subroutines
6.7 Summary
6.8 Terminal Questions
6.9 Answers
6.1 Introduction
The operation of a CPU is determined by the instructions it executes. These
instructions are referred to as machine instructions.
The complete collection of instructions that are executed by a CPU is
referred to as CPU’s Instruction Set. The instructions are usually
represented in binary and the program is usually written in assembly
language.
Objectives:
By the end of Unit 6, you should be able to:
1. Discuss the elements of a machine instruction.
2. Discuss the data types supported by IBM and VAX machines.
3. Discuss different addressing modes.
4. Explain with example different types of operations.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 141
Instruction Characteristics
Elements of a Machine Instruction: Each instruction must contain the
information required by the CPU for execution. Elements of a machine
Instruction are:
Operation Code: Specifies the operation to be performed. The
operation is specified by a binary code, known as the operation code,
or opcode.
Source Operand Reference: The operation may involve one or more
source operands; that is, operands that are the inputs for the operation.
Result Operand Reference: The operation may produce a result.
Next Instruction Reference: This tells the CPU where to fetch the next
instruction after the execution of the current instruction is complete.
The next instruction to be fetched is located in main memory or in case of
virtual memory system it can be either in main memory or secondary
memory. In most cases, the next instruction to be fetched immediately
follows the current instruction. In those cases, there is no explicit reference
to the next instruction. But when explicit reference is needed, then the main
memory or virtual memory address must be supplied. The form in which the
address is supplied is discussed in this section.
Source and Result operands can be in one of the three areas:
Main (or virtual) memory: The memory address must be supplied.
CPU register: CPU contains one or more registers that may be
referenced by machine instructions. If only one register exists, reference
to it may be implicit. If more than one register exists, then each register
is assigned a unique number, and the instruction must contain the
number of the desired register.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 142
I/O device: The instruction must specify the I/O module or device for the
operation. If memory-mapped I/O is used, this is just another memory
address.
Instruction Representation
Each instruction is represented by a sequence of bits. The instruction is
divided into fields, corresponding to the constituent elements of the
instruction. This layout of the instruction is called Instruction Format. With
most instruction sets, more than one format is used. It is difficult for a
programmer and the reader to deal with binary representations of machine
instructions. Hence usually the instructions are written in symbolic
representations of machine code using English like language called
mnemonics.
Thus operation codes, in short Opcodes are represented using mnemonics.
The format for this instruction is given in figure 6.1.
Opcode Operand Reference Operand Reference
Figure 6.1: A simple instruction format
In most modern CPU's, the first byte contains the opcode, sometimes
including the register reference in some of the instructions. The operand
references are in the following bytes (byte 2, 3, etc...).
Examples of few mnemonics:
Table 6.1: Examples of mnemonics
Mnemonics Description
ADD add
SUB subtract
MUL Multiply
LOAD Load data from memory
MOV A, B Move contents of B register to A register, where A & B are user visible registers of CPU
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 143
Example 1: An instruction with a register and memory address operands.
8 bits 8 bits 16/32 bits
Opcode Operand Ref. Operand Reference
register reference memory address
Example 2: An instruction may span to the second byte. Eg: 2 byte
instruction can be as follows:
12 bits 4 bits
Opcode Operand Ref.
register reference
For example consider an instruction,
ADD R, Y
Assuming that it means that add the contents of data location Y of the
memory to the contents of register R. The operation is performed on the
data present in location Y and not on its address i.e., Y itself.
Previous to the execution of add instruction we give a list of specifications
like X= 153, Y=154 and R=20. And the content of memory at location 153 is
10 and at 154 it is 20.
After the execution of the given ADD R, Y the result will be in R and will be
equal to {R=20+[Y=154]=20 }=40.
Instruction Types
Example: High level language statement: X = X + Y
If we assume a simple set of machine instructions, this operation could be
accomplished with three instructions: (assume X is stored in memory
location 624, and Y in memory loc. 625.)
1. Load a register with the contents of memory location 624.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 144
2. Add the contents of memory location 625 to the register,
3. Store the contents of the register in memory location 624.
As seen, a simple "C" (or BASIC) instruction may require 3 machine
instructions.
As we have seen before, the instructions fall into one the following four
categories:
Data processing: Arithmetic and logic instructions.
Data storage: Memory instructions.
Data movement: I/O instructions.
Control: Test and branch instructions.
Number of Addresses
What is the maximum number of addresses one might need in an
instruction?
Virtually all arithmetic and logic operations are either unary (one operand) or
binary (two operands). The result of an operation must be stored,
suggesting a third address. Finally, after the completion of an instruction,
the next instruction must be fetched, and its address is needed.
This line of reasoning suggests that an instruction could be required to
contain 4 address references: two operands, one result, and the address of
the next instruction. In practice, the address of the next instruction is
handled by the Program Counter (PC); therefore most instructions have
one, two or three operand addresses. Three-address instruction formats are
not common, because they require a relatively long instruction format to
hold three address references.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 145
Table 6.2: Utilization of Instruction Addresses (Non-branching Instructions):
Number of Addresses
Symbolic Representation
Interpretation
3 OP A, B, C A <- B OP C
2 OP A, B A <- A OP B
1 OP A AC <- AC OP A, (AC is Aaccumulator)
0 OP T <- T OP (T-1)
Example 3: Program to execute Y = (A – B) / (C + D x E):
Solution:
A) With One-Address Instructions: (requires an accumulator AC)
Table 6.3: Solution to ex.3
LOAD D AC <- D
MUL E AC <- AC x E
ADD C AC <- AC + C
STOR Y Y <- AC
LOAD A AC <- A
SUB B AC <- AC – B
DIV Y AC <- AC / Y
STOR Y Y <- AC
B) With Two-Address Instructions:
Table 6.4: Solution to ex.3
MOVE Y, A Y <- A
SUB Y, B Y <- Y – B
MOVE T, D T <- D
MUL T, E T <- T x E
ADD T, C T <- T + C
DIV Y, T Y <- Y / T
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 146
C) With Three-Address Instructions:
Table 6.5: Solution to ex.3
SUB Y, A, D Y <- A – B
MUL T, D, E T <- D x E
ADD T, T, C T <- T + C
DIV Y, Y, T Y <- Y / T
Instruction Set Design
The design of an instruction set is very complex, since it affects many
different aspects of the computer system. The instruction set defines many
of the functions performed by CPU. The instruction set is a programmer’s
means of implementation of the CPU.
The fundamental design issues in designing an instruction set are:
Operation Repertoire: How many and which operations to provide, and
how complex operations should be?
Data Types: The various types of data upon which operations are
performed.
Instruction Format: Instruction length (in bits), number of addresses,
sizes of various fields, and so on.
Registers: Number of CPU registers that can be referenced by
instructions and their use.
Addressing: The mode or modes by which the address of an operand is
specified.
These issues are highly interrelated and must be considered together in
designing an instruction set.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 147
6.2 Types of Operands
Data Types
The most important general categories of data are:
Addresses
Numbers
Characters
Logical data.
Numbers
All machine language include numeric data types. Even in non numeric data
processing, there is a need of numbers to act as counters, field widths, and
so on. The numbers stored in computer are limited. That is there is a limit to
the magnitude of numbers that can be represented on a machine. Also the
programmer is faced with understanding the consequences of rounding,
overflow, and underflow.
Three types of numerical data are commonly used in computers. They are:
1. Integer or fixed point
2. Floating point (real numbers)
3. Decimal (Remember BCD; an instruction set may be able to process
BCD numbers.)
1. Integer data type:
For unsigned integers: It is a straight forward, simple binary representation.
In an N-bit word the N-bits holds the magnitude of the numbers.
For example
00000000 = 0
00000001 = 1
00101001 = 41
10000000 = 128
11111111 = 255
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 148
For signed integers: It involves treating the Most Significant Bit (MSB)
which is nothing but leftmost bit in a word as a sign bit. That is if the MSB is
1 the number is considered as negative else positive. In an N-bit word, the
(N-1) bits holds the magnitude of the numbers.
For example
00000000 = 0
00000001 = 1
00010010 = 18
10010010 = -18
11111111 = -127
2. Floating point numbers: These numbers are represented as EBS *
Where Sign: plus or minus
S is significant
B is base for binary it is 2
E is exponent
0 1 8 9 31
Sign bit
Biased Exponent
Significant
Figure 6.2: Format of floating point number
3. Decimal: Usual decimal system.
Characters
International Reference Alphabet (IRA) is referred to as ASCII in the USA.
Each character in this code is represented by a unique 7-bit pattern, thus
128 different characters can be represented. Some of the patterns represent
control characters. ASCII – encoded characters are usually stored and
transferred as 8-bits per character. The eight bit may be set to 1 or 0 for
even parity. EBCDIC character set is used in IBM 370 machines. It is an
8-bit code.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 149
Logical Data
With logical data, memory can be used most efficiently for this storage.
Logical values are also called as Boolean values which are 1 = true,
0 = false.
IBM 370 Data types
The IBM S/370 architecture provides the following data types.
Binary integer: Binary integer may be either unsigned or signed.
Signed binary integers are stored in 2’s compliment form. Allowable
lengths are 16 and 32 bits.
Floating point: Floating point numbers of length 32, 64, and 128 bits
are allowed. All use 7-bit exponent field.
Decimal: Arithmetic on packed decimal integers is provided. The length
is from 1 to 16 bytes. The rightmost 4 bits of the rightmost byte hold the
sign. Hence signed numbers from 1 to 31 decimal digits can be
represented.
Binary logical: Operations are defined for data units of length 8, 32,
and 64 bits. And variable length logical data of up to 256 bytes.
Character: EBCDIC is used.
VAX Data types
The VAX provides an impressive array of data types. It is a byte oriented
machine. All data types are in terms of bytes including 16-bit word, 32-bit
long word, and the 64-bit quad-word, and even the 128 bit octa-word. The
data type of VAX machine provides the following five types of data.
Binary integer: Binary integers are usually considered as in 2’s
complement form. However they can be considered and operated on
unsigned integers. Allowable lengths are 8, 16, 32, 64, and 128 bits.
Floating point: It provides four different types of representations. They
are
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 150
F: 32 bits with an 8-bit exponent
D: 64 bits with an 8-bit exponent
G: 64 bits with an 11-bit exponent
H: 128 bits with an 15-bit exponent
The F type is normal or default representation. D is usual double precision
representation. G and H are provided for variety of applications to give
successively increasing range and precision over F.
Decimal: Arithmetic on packed decimal integers is provided. Two
formats are provided.
Packed decimal strings: The length is from 1 to 16 bytes with 4 bits holding
the sign.
Unpacked numeric strings: Stores 1 digit in ASCII representation, per byte
with up to 31 bytes of length.
Variable bit field: These are small integers packed together in large
data unit. A bit field is specified by three operands: address of byte
containing the start of field, starting bit position within the byte, the
length in bits of the field. This data type is used to increase memory
efficiency.
Character: ASCII is used.
6.3 Types of Operations
A set of general types of operations is as follows:
Data transfer
It is the most fundamental type of machine instruction. The data transfer
must specify
1) The location of the source and destination. Each location can be
memory, register or top of stack.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 151
2) The length of the data to be transferred.
3) The mode of addressing for each operand.
If both the operands are CPU registers, then the CPU simply causes data to
be transferred from one register to another. This operation is internal to the
CPU. If one or both operands are in memory, then the CPU must perform
some or all of the following actions:
1. Calculate the memory address, based on the address mode.
2. If the address refers to virtual memory, translate from virtual to actual
memory address.
3. Determine whether addressed item is in cache.
4. If not issue command to memory module.
For example: Move, Store, Load (Fetch), Exchange, Clear (Reset), Set,
Push, Pop
Arithmetic
Most machines provide basic arithmetic functions like Add, Subtract,
Multiply, and Divide. They are invariably provided for signed integer
numbers. Often they are also provided for floating point and packed decimal
numbers. Also some operations include only a single operand like Absolute,
that takes only absolute value of the operand, Negate that takes the
complement of the operands, Increment that increments the value of
operand by 1, Decrement that decrements the value of operand by 1.
Logical
Machines also provide a variety of operations for manipulating individual bits
of a word often referred to as bit twiddling. They are based on Boolean
operations like AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set
control variables.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 152
Conversion
These instructions are the ones that change the format or operate on the
format of the data. Examples are Translate, Convert.
I/O, System control
There is a variety of approaches like isolated programmed I/O, memory
mapped I/O, DMA and so on. Examples: Output (Write), Start I/O, Test I/O.
Transfer of Control
The instruction specifies the operation. In the normal course of events the
next instruction to be performed is the one that immediately follows the
current instruction in memory. But when the transfer of control type
instruction occurs they specify the location of the next instruction that is to
be executed. For example: Jump (Branch), Jump Conditional, Jump to
Subroutine, Return, Execute, Skip, Skip Conditional, Halt, Wait (Hold), And
No Operation.
System Control
These instructions are reserved for the use of Operating System. These
instructions can only be executed while the processor is in a privileged state,
or is executing a program in a special privileged area of memory.
6.4 Addressing Modes
In general, a program operates on data that reside in the computer’s
memory. These data can be organized in a variety of ways. If we want to
keep track of students’ names, we can write them in a list. If we want to
associate information with each name, for example to record telephone
numbers or marks in various courses, we may organize this information in
the form of a table. Programmers use organizations called data structures
to represent the data used in computations. These include lists, linked lists,
arrays, queues, and so on.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 153
Programs are normally written in a high-level language, which enables the
programmer to use constants, local and global variables, pointers, and
arrays. When translating a high-level language program into assembly
language, the compiler must be able to implement these constructs using
the facilities provided in the instruction set of the computer in which the
program will be run. The different ways in which the location of an operand
is specified in an instruction are referred to as addressing modes.
There are the following addressing modes:
Immediate Addressing
Direct Addressing
Indirect Addressing
Register Addressing
Register Indirect Addressing
Displacement Addressing
Stack Addressing
Notation:
A = Contents of an address field in the instruction.
R = Contents of an address field in the instruction that refers to a
register.
EA = Effective (actual) Address of the location containing the referenced
operand.
(X) = Contents of location X.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 154
Table 6.6: Basic Addressing Modes
Mode Algorithm Principal Advantage Principal Disadvantage
Immediate Operand = A No memory reference Limited operand magnitude
Direct EA = A Simple Limited address space
Indirect EA = (A) Large address space Multiple memory references
Register EA = R No memory reference Limited address space
Register Indirect
EA = (R) Large address space Extra memory reference
Displacement EA = A + (R) Flexibility Complexity
Stack EA = top of stack
No memory reference Limited capability
Direct Addressing Mode
EA = A.
Address field contains address of operand.
Effective address (EA) = address field (A) e.g. ADD A
Add contents of cell A to accumulator.
Look in memory at address A for operand.
Single memory reference to access data.
No additional calculations to work out effective address.
Limited address space
Figure 6.3
– The operand is in a memory location; the address of this location is given
explicitly in the instruction. (In some assembly languages, this mode is
called Direct.)
The instruction Move LOC, R2 uses these two modes. Processor registers
are used as temporary storage locations where the data in a register are
accessed using the Register mode. The Absolute mode can represent
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 155
global variables in a program. A declaration such as Integer A, B; in a high-
level language program will cause the compiler to allocate a memory
location to each of the variables A and B. Whenever they are referenced
later in the program, the compiler can generate assembly language
instructions that use the Absolute mode to access these variables. Next, let
us consider the representation of constants. Address and data constants
can be represented in assembly language using the immediate mode.
Immediate Addressing Mode
The operand is actually present ın the instruction.
Operand = A
Can be used to define and use constants, or set initial values.
Operand is part of instruction
Operand = address field
e.g. ADD 5
Add 5 to contents of accumulator
5 is operand
No memory reference to fetch data
Fast
Limited range
Figure 6.4: Immediate
For example, the instruction Move 200immediate, R0 places the value 200
in register R0. Clearly, the immediate mode is only used to specify the value
of a source operand. Using a subscript to denote the immediate mode is not
appropriate in assembly languages. A common convention is to use the
sharp sign (#) in front of the value to indicate that this value is to be used as
an immediate operand.
Hence, we write the instruction above in the form Move # 200, R0. Constant
values are used frequently in high-level language programs. For example,
the statement
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 156
A = B + 6 contains the constant 6. Assuming that A and B have been
declared earlier as variables and may be accessed using the Absolute
mode; this statement may be compiled as follows:
Move B, R1
Add #6, R1
Move R1, A
Constants are also used in assembly language to increment a counter, test
for some bit pattern, and so on.
Indirect Addressing Mode
Memory cell pointed to by address field contains the address of (pointer to) the operand
EA = (A)
Look in A, find address (A) and look there for operand
e.g. ADD (A)
Add contents of cell pointed to by contents of A to accumulator
Large address space
2n where n = word length
May be nested, multilevel, cascaded
e.g. EA = (((A))) Draw the diagram yourself
Multiple memory accesses to find operand
Hence slower
Figure 6.5
Indirect mode – The effective address of the operand is the contents of a
register or memory location whose address appears in the instruction.
We denote indirection by placing the name of the register or the memory
address given in the instruction in parentheses as illustrated in Figure and
Table.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 157
To execute the Add instruction in Figure 6.6a the processor uses the value
B, which is in register R1, as the effective address of the operand. It
requests a read operation from the memory to read the contents of location
B. The value read is the desired operand, which the processor adds to the
contents of register R0. Indirect addressing through a memory location is
also possible as shown in Figure 6.6b. In this case, the processor first reads
the contents of memory location A,
Figure 6.6
Figure 6.7
then requests a second read operation using the value B as an address to
obtain the operand. The register or memory location that contains the
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 158
address of an operand is called a pointer. Indirection and the use of pointers
are important and powerful concepts in programming. Consider the analogy
of a treasure hunt: In the instructions for the hunt you may be told to go to a
house at a given address. Instead of finding the treasure there, you find a
note that gives you another address where you will find the treasure.
By changing the note, the location of the treasure can be changed, but the
instructions for the hunt remain the same. Changing the note is equivalent to
changing the contents of a pointer in a computer program. For example, by
changing the contents of register R1 or location A in Figure the same Add
instruction fetches different operands to add to register R0.
For adding a list of numbers, indirect addressing can be used to access
successive numbers in the list, resulting in the program shown in Figure 6.7.
Register R2 is used as a pointer to the numbers in the list, and the operands
are accessed indirectly through R2. The initialization section of the program
loads the counter value n from memory location N into R1 and uses the
immediate addressing mode to place the address value NUM1, which is the
address of the first number in the list, into R2. Then it clears R0 to 0.
The instruction, Add (R2), R0 fetches the operand at location NUM1 and
adds it to R0. The second Add instruction adds 4 to the contents of the
pointer R2, so that it will contain the address value NUM2 when the above
instruction is executed in the second pass through the loop.
Consider the C-language statement A=&B; where B is a pointer variable.
This statement may be compiled into Move B, R1
Move (R1), A
Using indirect addressing through memory, the same action can be
achieved with Move (B), A. Despite its apparent simplicity, indirect
addressing through memory has proven to be of limited usefulness as an
addressing mode, and it is seldom found in modern computers.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 159
Indirect addressing through registers is used extensively. The program in
Figure shows the flexibility it provides. Also, when absolute addressing is
not available, indirect addressing through registers makes it possible to
access global variables by first loading the operand’s address in a register.
Register Addressing Mode
Operand is held in register named in address field
EA = R
Limited number of registers
Very small address field needed
Shorter instructions
Faster instruction fetch
No memory access
Very fast execution
Very limited address space
Multiple registers helps performance
Requires good assembly programming or compiler writing
N.B. C programming
register into a;
Figure 6.8
Figure 6.9
Figure 6.10
Figure 6.11
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 160
Register Indirect Addressing Mode o Register indirect addressing
o EA = (R)
o Operand is in memory cell pointed to by contents of register R
o Large address space (2n)
o Fewer memory access than indirect addressing
Indexing and pointers
The next addressing mode we discuss provides a different kind of flexibility
for accessing operands. It is useful in dealing with lists and arrays.
Index mode – The effective address of the operand is generated by adding
a constant value to the contents of a register. The register used may be
either a special register provided for this purpose, or, more commonly, it
may be any one of a set of general-purpose registers in the processor. In
either case, it is referred to as an index register. We indicate the Index
mode symbolically as X(Ri ) where X denotes the constant value contained
in the instruction and Ri is the name of the register involved. The effective
address of the operand is given by EA = X + [Ri ]. The contents of the index
register are not changed in the process of generating the effective address.
In an assembly language program, the constant X may be given either as an
explicit number or as a symbolic name representing a numerical value.
When the instruction is translated into machine code, the constant X is given
as a part of the instruction and is usually represented by fewer bits than the
word length of the computer. Since X is a signed integer, it must be sign-
extended to the register length before being added to the contents of the
register.
Figure 6.12 illustrates two ways of using the Index mode. In Figure 6.12a,
the index register, R1, contains the address of a memory location, and the
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 161
value X defines an offset (also called a displacement) from this address to
the location where the operand is found.
An alternative use is illustrated in Figure 6.12b. Here, the constant X
corresponds to a memory address, and the contents of the index register
define the offset to the operand. In either case, the effective address is the
sum of two values; one is given explicitly in the instruction, and the other is
stored in a register.
Figure 6.12
Displacement Addressing Mode
EA = A + (R)
Address field hold two values
A = base value
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 162
R = register that holds displacement
or vice versa
Relative Addressing Mode
o A version of displacement addressing
o R = Program counter, PC
o EA = A + (PC)
o i.e. get operand from A cells from current location pointed to by PC
Base-Register Addressing
o A holds displacement
o R holds pointer to base address
o R may be explicit or implicit
o e.g. segment registers in 80x86
Indexing
A = base
R = displacement
EA = A + R
Good for accessing arrays
EA = A + R
R++
Stack Addressing
o Operand is (implicitly) on top of stack
o e.g. ADD Pop top two items from stack and add
Other additional addressing modes
The two modes described next are useful for accessing data items in
successive locations in the memory.
Autoincrement mode – The effective address of the operand is the
contents of a register specified in the instruction. After accessing the
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 163
operand, the contents of this register are automatically incremented to point
to the next item in a list.
We denote the Autoincrement mode by putting the specified register in
parentheses, to show that the contents of the register are used as the
effective address, followed by a plus sign to indicate that these contents are
to be incremented after the operand is accessed. Thus, the Autoincrement
mode is written as (Ri )+.
6.5 Instruction Formats
Instruction Format is defined as the layout of bits in an instruction in terms
of its constituent parts. An Instruction Format must include opcode implicitly
or explicitly and one or more operand(s). For, most instruction sets have
usually more than one instruction format.
Instruction Length
Most important design issue is the length of an instruction. It is affected by
and affects Memory size, Memory organization, Bus structure, CPU
complexity, CPU speed. There is a trade off between powerful instruction
repertoire and saving space.
Apart from this tradeoff there are other considerations. Either the instruction
length should be equal to the memory transfer length or one should be a
multiple of the other. A related consideration is the memory transfer rate.
Allocation of Bits
The factors that determine the use of addressing bits are:
Number of addressing modes: Some times addressing mode is
implicit in the instruction or may be certain opcodes call for indexing. In
other cases the addressing mode must be explicit and one or more bits
are needed.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 164
Number of operands: Typically today’s machines provide two operands.
Each operand may require its own mode indicator or the use of indicator
is limited to one of the address fields.
Register versus memory: A machine must have registers so that the
data can be brought into the CPU for processing. One operand address
is implicit. The more the registers are used to specify the operands less
the number of bits needed.
Number of register sets: Almost all machines have a set of general
purpose registers, with typically 8 or 16 registers in it. These registers
can be used to store data or addresses for displacement addressing etc.
Address range: For addresses that refer to the memory locations, the
range of addresses is related to the number of address bits. Because of
this limitation direct addressing is rarely used.
Address granularity: It is concerned with addresses that refer to the
memory other than registers. In a system with 16 or 32 bit words, an
address can refer to a word or a byte at the designer’s choice. Byte
addressing is convenient for character manipulation but requires fixed
size memory, and hence more address bits.
Variable-Length Instruction
The designer might provide a variety of instruction formats of different
lengths. Addressing can be more flexible, with various combinations of
registers and memory references plus addressing modes.
The price that needs to be paid is an increase in the complexity of the CPU.
1. Due to varying number of operands,
2. Due to varying lengths of opcode in some CPUs.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 165
6.6 Stacks and Subroutines
Stacks
A computer program often needs to perform a particular subtask using the
familiar subroutine structure. In order to organize the control and information
linkage between the main program and the subroutine, a data structure
called a stack is used.
This section will describe stacks, as well as a closely related data structure
called a queue.
Data operated on by a program can be organized in a variety of ways. We
have already encountered a data structure called list. Now, we consider an
important data structure known as a stack. A stack is a list of data elements,
usually words or bytes, with the accessing restriction that elements can be
added or removed at one end of the stack only. This end is called the top of
the stack, and the other end is called the bottom.
The structure is sometimes referred to as a pushdown stack. Imagine a pile
of trays in a cafeteria; customers pick up new trays from the top of the pile,
and clean trays are added to the pile by placing them onto the top of the pile.
Another descriptive phrase, last-in-first-out (LIFO) stack, is also used to
describe this type of storage mechanism; the last data item placed on the
stack is the first one removed when retrieval begins. The terms push and
pop are used to describe placing a new item on the stack and removing the
top item from the stack, respectively.
Data stored in the memory of a computer can be organized as a stack, with
successive elements occupying successive memory locations. Assume that
the first element is placed in location BOTTOM, and when new elements are
pushed onto the stack, they are placed in successively lower address
locations. We use a stack that grows in the direction of decreasing memory
addresses in our discussion, because this is a common practice.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 166
Figure 6.13 shows a stack of word data items in the memory of a computer.
It contains numerical values, with 43 at the bottom and 28 at the top. A
processor register is used to keep track of the address of the element of the
stack that is at the top at any given time. This register is called the stack
pointer (SP). It could be one of the general-purpose registers or a register
dedicated to this function. If we
Figure 6.13
assume a byte-addressable memory with a 32-bit word length; the push
operation can be implemented as
Subtract #4, SP
Move NEWITEM, (SP)
where the Subtract instruction subtracts the source operand 4 from the
destination operand contained in SP and places the result in SP. These two
instructions move the word from location NEWITEM onto the top of the
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 167
stack, decrementing the stack pointer by 4 before the move. The pop
operation can be implemented as
Move (SP), ITEM
Add #4, SP
These two instructions move the top value from the stack into location ITEM
and then increment the stack pointer by 4 so that it points to the new top
element. Figure 6.14 shows the effect of each of these operations on the
stack in Figure 6.13. If the processor has the Autoincrement and
Autodecrement addressing modes, then the push operation can be
performed by the single instruction Move NEWITEM,.(SP) and the pop
operation can be performed by Move (SP)+, ITEM.
When a stack is used in a program, it is usually allocated a fixed amount of
space in the memory. In this case, we must avoid pushing an item onto the
stack when the stack has reached its maximum size. Also, we must avoid
attempting to pop an item off an empty stack, which could result from a
programming error. Suppose that a stack runs from location 2000
(BOTTOM) down no further than location 1500. The stack pointer is loaded
initially with the address value 2004. Recall that SP is decremented by 4
before new data are stored on the stack. Hence, an initial value of 2004
means that the first item pushed onto the stack will be at location 2000. To
prevent either pushing an item on a full stack or popping an item off an
empty stack, the single-instruction push and pop operations can be replaced
by the instruction sequences shown in Figure 6.15.
Compare instruction: Compare src, dst performs the operation [dst] . [src]
and sets the condition code flags according to the result. It does not change
the value of either operand.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 168
Figure 6.14
Figure 6.15
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 169
Subroutines
In a given program, it is often necessary to perform a particular subtask
many times on different data values. Such a subtask is usually called a
subroutine. For example, a subroutine may evaluate the sine function or
sort a list of values into increasing or decreasing order. It is possible to
include the block of instructions that constitute a subroutine at every place
where it is needed in the program. However, to save space, only one copy
of the instructions that constitute the subroutine is placed in the memory,
and any program that requires the use of the subroutine simply branches to
its starting location. When a program branches to a subroutine we say that it
is calling the subroutine. The instruction that performs this branch operation
is named a Call instruction.
After a subroutine has been executed, the calling program must resume
execution, continuing immediately after the instruction that called the
subroutine. The subroutine is said to return to the program that called it by
executing a Return instruction. Since the subroutine may be called from
different places in a calling program, provision must be made for returning to
the appropriate location. The location where the calling program resumes
execution is the location pointed to by the updated PC while the Call
instruction is being executed. Hence, the contents of the PC must be saved
by the Call instruction to enable correct return to the calling program.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 170
Figure 6.16: An example of CALL and RETURN
The way in which a computer makes it possible to call and return from
subroutines is referred to as its subroutine linkage method. The simplest
subroutine linkage method is to save the return address in a specific
location, which may be a register dedicated to this function. Such a register
is called the link register. When the subroutine completes its task, the
Return instruction returns to the calling program by branching indirectly
through the link register.
The Call instruction is just a special branch instruction that performs the
following operations:
Store the contents of the PC in the link register
Branch to the target address specified by the instruction
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 171
The Return instruction is a special branch instruction that performs the
operation:
Branch to the address contained in the link register. Figure 6.16 illustrates
this procedure.
6.7 Summary
This chapter covers the representation of instructions and discusses
Arithmetic and logic instructions, Memory instructions, I/O instructions, and
Test and branch instructions. We have also seen different types of
addressing modes like direct, indirect, immediate and register, including the
important concepts of pointers and indexed addressing. Here we also see
the different data types like Addresses, Numbers, Characters, and Logical
data. Numbers can be integers or decimal (floating point). As an example
we have studied data types for two machines IBM and VAX. Finally the
concept of subroutine with the application of the stack data structure is
discussed.
Self Assessment Questions
1. ___________ specifies the operation to be performed.
2. Opcodes are represented using ___________.
3. Floating point numbers are ___________.
4. ___________ is referred as ASCII in USA.
5. ___________ instructions are reserved for the use of operating system.
6.8 Terminal Questions
1. Define the following:
a. Op-code.
b. Machine instruction.
2. Discuss the different categories of instructions.
Computer Organization and Architecture Unit 6
Sikkim Manipal University - DDE Page No. 172
3. Write a note on different data types with examples for each.
4. Give the details of Data types specified for VAX & IBM 370 machines.
5. Discuss in detail, with examples, the addressing modes that do not use
memory.
6. Explain the calculation of effective address in case of Displacement
addressing and relative addressing modes.
7. Write a note on:
a. Stack
b. Subroutines
6.9 Answers
Self Assessment Questions:
1. operation code or opcode
2. mnemonics
3. real numbers
4. international reference alphabet (IRA)
5. system control
Terminal Questions:
1. Refer Section 6.1
2. Refer Section 6.1
3. Refer Section 6.2
4. Refer Section 6.2
5. Refer Section 6.4
6. Refer Section 6.4
7. Refer Section 6.6
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 173
Unit 7 Arithmetic Logic Unit
Structure:
7.1 Introduction
Objectives
7.2 Arithmetic Logic Unit
7.3 Number Representations
Non-negative Integers
Negative Integers
Infinite-Precision Ten's Complement
Finite-Precision Ten's Complement
Finite-Precision Two's Complement
Rational Numbers
7.4 Summary
7.5 Terminal Questions
7.6 Answers
7.1 Introduction
In this unit we focus on the most complex aspect of ALU and control unit
which are the main components of the processing unit. We have seen in
the preceding units the task that CPU must do for execution of an
instruction. That is it fetches instruction, interprets it, fetches data,
processes data, and finally writes result into appropriate location. Internal
CPU bus is needed to transfer data between the various registers and the
ALU, because the ALU in fact operates only on data in the internal memory
of CPU that is registers.
Objectives:
By the end of Unit 7, you should be able to:
1. Explain the ALU
2. Discuss the different number representations.
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 174
7.2 Arithmetic Logic Unit
The ALU is the part of the CPU that actually performs arithmetic and logical
operations on data. All of the other elements of the computer system -
control unit, registers, memory, I/O - are there mainly to bring data into ALU
for it to process and then take the results back out.
Figure 7.1: ALU Inputs and Outputs
The inputs and outputs of ALU are shown in figure 7.1. The inputs to the
ALU are the control signals generated by the control unit of CPU, and the
registers of the CPU where the operands for the manipulation of data are
stored. The output is a register called status word or flag register which
reflects the result and the registers of the CPU where the result can be
stored. Thus data are presented to the ALU in registers, and the results of
an operation are also stored in registers. These registers are connected by
signal paths to the ALU. ALU does not directly interact with memory or other
parts of the system (e.g. I/O modules), it only interacts directly with
registers. An ALU like all other electronic components of a computer is
based on the use of simple digital devices that store binary digits and
perform Boolean logic operations.
The control unit is responsible for moving data to memory or I/O
modules. Also, it is the control unit that signals all the operations that
happen in the CPU. The operations, functions and implementation of
Control Unit will be discussed in the tenth unit.
In this unit we will concentrate on the ALU. An important part of the use of
logic circuits is for computing various mathematical operations such as
ALU
Control Unit
Registers Registers
Flags
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 175
addition, multiplication, trigonometric operations, etc. Hence we will be
discussing the arithmetic involved in using ALU.
First, before discussing the computer arithmetic we must have a way of
representing numbers as binary data.
7.3 Number Representations
Computers are built using logic circuits that operate on information
represented by two valued electrical signals. We label the two values as 0
and 1; and we define the amount of information represented by such a
signal as a bit of information, where bit stands for binary digit. The most
natural way to represent a number in a computer system is by a string of
bits, called a binary number. A text character can also be represented by a
string of bits called a character code. We will first describe binary number
representations and arithmetic operations on these numbers, and then
describe character representations.
Non-negative Integers
The easiest numbers to represent are the non-negative integers. To see
how this can be done, recall how we represent a number in the decimal
system. A number such as 2034 is interpreted as:
2*103 + 0*102 + 3*101 + 4*100
But there is nothing special with the base 10, so we can just as well use
base 2. In base 2, each digit value is either 0 or 1, which we can represent,
for instance, by false and true, respectively.
In fact, we have already hinted at this possibility, since we usually write 0
and 1 instead of false and true.
All the normal algorithms for decimal arithmetic have versions for binary
arithmetic, except that they are usually simpler.
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 176
Negative Integers
Things are easy as long as we stick to non-negative integers. They become
more complicated when we want to represent negative integers as well.
In binary arithmetic, we simply reserve one bit to determine the sign. In the
circuitry for addition, we would have one circuit for adding two numbers, and
another for subtracting two numbers. The combination of signs of the two
inputs would determine which circuit to use on the absolute values, as well
as the sign of the output.
While this method works, it turns out that there is one that is much easier to
deal with by electronic circuits. This method is called the ‘two's
complement’ method. It turns out that with this method, we do not need a
special circuit for subtracting two numbers.
In order to explain this method, we first show how it would work in decimal
arithmetic with infinite precision. Then we show how it works with binary
arithmetic, and finally how it works with finite precision.
Infinite-Precision Ten's Complement
Imagine the odometer of an automobile. It has a certain number of wheels,
each with the ten digits on it. When one wheel goes from 9 to 0, the wheel
immediately to the left of it advances by one position. If that wheel already
showed 9, it too goes to 0 and advances the wheel to its left, etc.
Now suppose we have an odometer with an infinite number of wheels. We
are going to use this infinite odometer to represent all the integers.
When all the wheels are 0, we interpret the value as the integer 0.
A positive integer n is represented by an odometer position obtained by
advancing the rightmost wheel n positions from 0. Notice that for each such
positive number, there will be an infinite number of wheels with the value 0
to the left.
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 177
A negative integer n is represented by an odometer position obtained by
decreasing the rightmost wheel n positions from 0. Notice that for each such
negative number, there will be an infinite number of wheels with the value 9
to the left.
In fact, we don't need an infinite number of wheels. For each number only a
finite number of wheels are needed. We simply assume that the leftmost
wheel (which will be either 0 or 9) is duplicated an infinite number of times to
the left.
While for each number we only need a finite number of wheels, the number
of wheels is unbounded, i.e., we cannot use a particular finite number of
wheels to represent all the numbers. The difference is subtle but important
(but perhaps not that important for this particular course). If we need an
infinite number of wheels, then there is no hope of ever using this
representation in a program, since that would require an infinite-size
memory. If we only need an unbounded number of wheels, we may run out
of memory, but we can represent a lot of numbers (each of finite size) in a
useful way. Since any program that runs in finite time only uses a finite
number of numbers, with a large enough memory, we might be able to run
our program.
Now suppose we have an addition circuit that can handle non-zero integers
with an infinite number of digits. In other words, when given a number
starting with an infinite number of 9s, it will interpret this as an infinitely large
positive number, whereas our interpretation of it will be a negative number.
Let us say, we give this circuit the two numbers ...9998 (which we interpret
as -2) and ...0005 (which we interpret as +5). It will add the two numbers.
First it adds 8 and 5 which gives 3 and a carry of 1. Next, it adds 9 and the
carry 1, giving 0 and a carry of 1. For all remaining (infinitely many)
positions, the value will be 0 with a carry of 1, so the final result is ...0003.
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 178
This result is the correct one, even with our interpretation of negative
numbers. You may argue that the carry must end up somewhere, and it
does, but in infinity. In some ways, we are doing arithmetic modulo infinity.
Some implementations of some programming languages with arbitrary
precision integer arithmetic (Lisp for instance) use exactly this
representation of negative integers.
Finite-Precision Ten's Complement
What we have said in the previous section works almost as well with a fixed
bounded number of odometer wheels. The only problem is that we have to
deal with overflow and underflow.
Suppose we have only a fixed number of wheels say 3. In this case, we
shall use the convention that if the leftmost wheel shows a digit between 0
and 4 inclusive, then we have a positive number, equal to its representation.
When instead the leftmost wheel shows a digit between 5 and 9 inclusive,
we have a negative number, whose absolute value can be computed with
the method that we have in the previous section.
We now assume that we have a circuit that can add positive three-digit
numbers, and we shall see how we can use it to add negative numbers in
our representation.
Suppose, again we want to add -2 and +5. The representations for these
numbers with three wheels are 998 and 005 respectively. Our addition
circuit will attempt to add the two positive numbers 998 and 005, which
gives 1003. But since the addition circuit only has three digits, it will truncate
the result to 003, which is the right answer for our interpretation.
A valid question at this point is in which situation our finite addition circuit will
not work. The answer is somewhat complicated. It is clear that it always
gives the correct result when a positive and a negative number are added. It
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 179
is incorrect in two situations. The first situation is when two positive numbers
are added, and the result comes out looking like a negative number, i.e. with
a first digit somewhere between 5 and 9. You should convince yourself that
no addition of two positive numbers can yield an overflow and still look like a
positive number. The second situation is when two negative numbers are
added and the result comes out looking like a non-negative number, i.e. with
a first digit somewhere between 0 and 4. Again, you should convince
yourself that no addition of two negative numbers can yield an underflow
and still look like a negative number.
We now have a circuit for addition of integers (positive or negative) in our
representation. We simply use a circuit for addition of only positive numbers,
plus some circuits that check:
If both numbers are positive and the result is negative, then report
overflow.
If both numbers are negative and the result is positive, then report
underflow.
Finite-Precision Two's Complement
So far, we have studied the representation of negative numbers using ten's
complement. In a computer, we prefer using base two rather than base ten.
Luckily, the exact method described in the previous section works just as
well for base two. For an n-bit adder (n is usually 32 or 64), we can
represent positive numbers with a leftmost digit of 0, which gives values
between 0 and 2(n-1) - 1, and negative numbers with a leftmost digit of 1,
which gives values between -2(n - 1) and -1.
Exactly the same rule works for overflow and underflow detection. If, when
adding two positive numbers, we get a result that looks negative (i.e. with its
leftmost bit 1), then we have an overflow. Similarly, if, when adding two
negative numbers, we get a result that looks positive (i.e. with its leftmost bit
0), then we have an underflow.
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 180
Rational Numbers
Integers are useful, but sometimes we need to compute with numbers that
are not integer.
An obvious idea is to use rational numbers. Many algorithms, such as the
simplex algorithm for linear optimization, use only rational arithmetic
whenever the input is rational.
There is no particular difficulty in representing rational numbers in a
computer. It suffices to have a pair of integers, one for the numerator and
one for the denominator.
To implement arithmetic on rational numbers, we can use some additional
restrictions on our representation. We may, for instance, decide that:
Positive rational numbers are always represented as two positive
integers (the other possibility is as two negative numbers),
Negative rational numbers are always represented with a negative
numerator and a positive denominator (the other possibility is with a
positive numerator and a negative denominator),
The numerator and the denominator are always relative prime (they
have no common factors).
Such a set of rules makes sure that our representation is canonical, i.e., that
the representation for a value is unique, even though, a priori, many
representations would work.
Circuits for implementing rational arithmetic would have to take such rules
into account. In particular, the last rule would imply dividing the two integers
resulting from every arithmetic operation with their largest common factor to
obtain the canonical representation.
Rational numbers and rational arithmetic is not very common in the
hardware of a computer. The reason is probably that rational numbers don't
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 181
behave very well with respect to the size of the representation. For rational
numbers to be truly useful, their components, i.e., the numerator and the
denominator both need to be arbitrary-precision integers. As we have
mentioned before, arbitrary precision anything does not go very well with
fixed-size circuits inside the CPU of a computer.
Programming languages, on the other hand, sometimes use arbitrary-
precision rational numbers. This is the case, in particular, with the language
Lisp.
7.4 Summary
The ALU operates only on data using the registers which are internal to the
CPU. All computers deal with numbers. We have studied the instructions
that perform basic arithmetic operations on data operands in the previous
unit. To understand the task carried out by the ALU, we have introduced the
representation of numbers in a computer and how they are manipulated in
addition and subtraction operations.
Self Assessment Questions
1. In binary arithmetic, we simply reserve ___________ bit to determine
the sign.
2. The infinite precision ten’s complement of -2 and +5 is ___________
and ___________.
3. The finite precision three digit ten’s complement of -2 and +5 is
___________ and ___________.
4. Using finite precision ten’s complement, if both numbers for addition are
positive and results negative, then the circuit reports ___________.
5. ___________ is not very common in the hardware of a computer.
Computer Organization and Architecture Unit 7
Sikkim Manipal University - DDE Page No. 182
7.5 Terminal Questions
1. Explain the ALU operations.
2. Discuss various number representations in a computer system.
7.6 Answers
Self Assessment Questions:
1. one
2. 98, 05
3. 998, 005
4. overflow
5. rational numbers and rational arithmetic
Terminal Questions:
1. Refer Section 7.2
2. Refer Section 7.3
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 183
Unit 8 Binary Arithmetic
Structure:
8.1 Introduction
Objectives
8.2 Binary Arithmetic
Overflow in Integer Arithmetic
Binary Addition
Subtraction
Another Note on Overflow
Multiplication
Unsigned Integer Multiplication:
Straightforward Method
Unsigned Integer Multiplication:
A More Efficient Method
Positive Integer Multiplication
Signed Integer Multiplication
Division
8.3 Floating Point Numbers
Floating Point Variables
Floating Point Arithmetic
Addition of Floating-Point Numbers
Time for Floating-Point Addition
Pipelined Floating-Point Addition
8.4 Real Numbers
8.5 Summary
8.6 Terminal Questions
8.7 Answers
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 184
8.1 Introduction
In this unit we focus on how various arithmetic operations like addition,
subtraction, multiplication, and division of binary numbers are performed in
computers. It also discusses floating point numbers and rational numbers
representations.
Objectives:
By the end of Unit 8, you should be able to:
1. Compute the addition of signed and unsigned integers.
2. Compute the addition of floating point numbers
3. Compute the multiplication and division of signed and unsigned integers.
8.2 Binary Arithmetic
Inside a computer system, all operations are carried out on fixed-length
binary values that represent application-oriented values. The schemes used
to encode the application information have an impact on the algorithms for
carrying out the operations. The unsigned (binary number system) and
signed (2’s complement) representations have the advantage that addition
and subtraction operations have simple implementations, and that the same
algorithm can be used for both representations. This note discusses
arithmetic operations on fixed-length binary strings, and some issues in
using these operations to manipulate information.
It might be reasonable to hope that the operations performed by a computer
always result in correct answers. It is true that the answers are always
correct but we must always be careful about what is meant by correct.
Computers manipulate fixed-length binary values to produce fixed-length
binary values. The computed values are correct according to the algorithms
that are used; however, it is not always the case that the computed value is
correct when the values are interpreted as representing application
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 185
information. Programmers must appreciate the difference between
application information and fixed-length binary values in order to appreciate
when a computed value correctly represents application information!
A limitation in the use of fixed-length binary values to represent application
information is that only a finite set of application values can be represented
by the binary values. What happens if applying an operation on values
contained in the finite set results in an answer that is outside the set? For
example, suppose that 4-bit values are used to encode counting numbers,
thereby restricting the set of represented numbers to 0 .. 15. The values 4
and 14 are inside the set of represented values. Performing the operation
4 + 14 should result in 18; however, 18 is outside the set of represented
numbers. This situation is called overflow, and programs must always be
written to deal with potential overflow situations.
Overflow in Integer Arithmetic
Using four bits, the range of numbers that can be represented is -8 through
+7. When the result of an arithmetic operation is outside the representable
range, an arithmetic overflow has occurred.
When adding unsigned numbers, the carry-out, cn, from the most significant
bit position serves as the overflow indicator. However, this does not work for
adding signed numbers. For example, when using 4-bit signed numbers, if
we try to add the numbers +7 and +4, the output sum vector, S, is 1011,
which is the code for -5, an incorrect result. The carry-out signal from the
MSB position is 0. Similarly, if we try to add -4 and -6, we get S = 0110 = +6,
another incorrect result, and in this case, the carry-out signal is 1. Thus,
overflow may occur if both summands have the same sign. Clearly, the
addition of numbers with different signs cannot cause overflow. This leads to
the following conclusions:
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 186
1. Overflow can occur only when adding two numbers that have the same
sign.
2. The carry-out signal from the sign-bit position is not a sufficient indicator
of over- flow when adding signed numbers.
A simple way to detect overflow is to examine the signs of the two
summands X and Y and the sign of the result. When both operands X and Y
have the same sign, an overflow occurs when the sign of S is not the same
as the signs of X and Y.
Binary Addition
The binary addition of two bits (a and b) is defined by the table:
a b a + b
0 0 0
0 1 1 carry 0
1 0 1
1 1 0 carry 1
When adding n-bit values, the values are added in corresponding bit-wise
pairs, with each carry being added to the next most significant pair of bits.
The same algorithm can be used when adding pairs of unsigned or pairs of
signed values.
4-Bit Example A:
0 1 0 carry values
value 1: 1 0 1 1
+ value 2: + 0 0 1 0
result: 1 1 0 1
Since computers are constrained to deal with fixed-width binary values, any
carry out of the most significant bit-wise pair is ignored.
bit-wise pairs
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 187
4-Bit Example B:
1 1 1 0 carry values
value 1: 1 0 1 1
+ value 2: + 0 1 1 0
result: 1 0 0 0 1
The binary values generated by the addition algorithm are always correct
with respect to the algorithm, but what is the significance when the binary
values are intended to represent application information? Will the operation
yield a result that accurately represents the result of adding the application
values?
First consider the case where the binary values are intended to represent
unsigned integers (i.e. counting numbers). Adding the binary values
representing two unsigned integers will give the correct result (i.e. will yield
the binary value representing the sum of the unsigned integer values)
providing the operation does not overflow – i.e. when the addition operation
is applied to the original unsigned integer values, the result is an unsigned
integer value that is inside of the set of unsigned integer values that can be
represented using the specified number of bits (i.e. the result can be
represented under the fixed-width constrains imposed by the
representation).
Reconsider 4-Bit Example A (above) as adding unsigned values:
0 1 0 carry values
value 1: 1110 1 0 1 1
+ value 2: + 210 + 0 0 1 0
result: 1310 1 1 0 1
bitwise pairs
ignored 4-bit result
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 188
In this case, the binary result (11012) of the operation accurately represents
the unsigned integer sum (13) of the two unsigned integer values being
added (11 + 2), and therefore, the operation did not overflow. But what
about 4-Bit Example B (above)?
1 1 1 0 carry values
value 1: 1110 1 0 1 1
+ value 2: + 610 + 0 1 1 0
result: 1710 0 0 0 1 ???? 11 + 6 = 1 ?????
When the values added in Example B are considered as unsigned values,
then the 4-bit result (1) does not accurately represent the sum of the
unsigned values (11 + 6)! In this case, the operation has resulted in
overflow: the result (17) is outside the set of values that can be represented
using 4-bit binary number system values (i.e. 17 is not in the set {0 , … ,
15}). The result (00012) is correct according to the rules for performing
binary addition using fixed-width values, but truncating the carry out of the
most significant bit resulted in the loss of information that was important to
the encoding being used. If the carry had been kept, then the 5-bit result
(100012) would have represented the unsigned integer sum correctly.
But more can be learned about overflow from the above examples! Now
consider the case where the binary values are intended to represent signed
integers.
Reconsider 4-Bit Example A (above) as adding signed values:
0 1 0 carry values
value 1: – 510 1 0 1 1
+ value 2: + 210 + 0 0 1 0
result: – 310 1 1 0 1 no overflow !
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 189
In this case, the binary result (11012) of the operation accurately represents
the signed integer sum (– 3) of the two signed integer values being added
(– 5 + 2) therefore, the operation did not overflow. What about 4-Bit
Example B?
1 1 1 0 carry values
value 1: – 510 1 0 1 1
+ value 2: + 610 + 0 1 1 0
result: 110 0 0 0 1 no overflow!
In this case, the result (again) represents the signed integer answer
correctly, and therefore, the operation did not overflow.
Recall that in the unsigned case, Example B resulted in overflow. In the
signed case, Example B did not overflow. This illustrates an important
concept: overflow is interpretation dependent! The concept of overflow
depends on how information is represented as binary values. Different types
of information are encoded differently, yet the computer performs a specific
algorithm, regardless of the possible interpretations of the binary values
involved. It should not be surprising that applying the same algorithm to
different interpretations may have different overflow results.
Subtraction
The binary subtraction of two bits (a and b) is defined by the table:
a b a – b
0 0 0
1 0 1 borrow 0
1 1 0
0 1 1 borrow 1
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 190
When subtracting n-bit values, the values are subtracted in corresponding
bit-wise pairs; with each borrow rippling down from the more significant bits
as needed. If none of the more significant bits contains a 1 to be borrowed,
then 1 may be borrowed into the most significant bit.
4-Bit Example C:
must borrow from second digit
1 0 1 0
– 0 0 0 1
becomes: Interpretations
0 unsigned signed no overflow in either case
1 0 1 10 10 – 6
– 0 0 0 1 – 1 – 1
1 0 0 1 9 – 7
4-Bit Example D:
must borrow from above most significant digit
0 0 0 1
– 1 1 1 1
becomes: Interpretations
1 1 unsigned signed overflow in unsigned case
10 10 10 1 1 1 no overflow in signed case
– 1 1 1 1 – 15 – –1
0 0 1 0 2 2
Most computers apply the mathematical identity:
a – b = a + ( – b )
borrow from
above most signif.
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 191
to perform subtraction by negating the second value (b) and then adding.
This can result in a saving in transistors since there is no need to implement
a subtraction circuit.
Another Note on Overflow
Are there easy ways to decide whether an addition or subtraction results in
overflow? Yes but we should be careful that we understand the concept,
and don’t rely on memorizing case rules that allow the occurrence of
overflow to be identified.
For unsigned values, a carry out of (or a borrow into) the most significant bit
indicates that overflow has occurred.
For signed values, overflow has occurred when the sign of the result is
impossible for the signs of the values being combined by the operation. For
example, overflow has occurred if:
Two positive values are added and the sign of the result is negative
A negative value is subtracted from a positive value and the result is
negative (a positive minus a negative is the same as a positive plus a
positive, and should result in a positive value, i.e. a – ( – b) = a + b )
These are just two examples of some of the possible cases for signed
overflow.
Note that it is not possible to overflow for some signed values. For example,
adding a positive and a negative value will never overflow. To convince you
of why this is the case, picture the two values on a number line as shown
below. Suppose that a is a negative value, and b is a positive value. Adding
the two values, a + b will result in c such that c will always lie between a and
b on the number line. If a and b can be represented prior to the addition,
then c can also be represented, and overflow will never occur.
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 192
Multiplication
Multiplication is a slightly more complex operation than addition or
subtraction. Multiplying two n-bit values together can result in a value of up
to 2n-bits. To help to convince you of this, think about decimal numbers:
multiplying two 1-digit numbers together result in a 1- or 2-digit result, but
cannot result in a 3-digit result (the largest product possible is 9 x 9 = 81).
What about multiplying two 2-digit numbers? Does this extrapolate to n-digit
numbers? To further complicate matters, there is a reasonably simple
algorithm for multiplying binary values that represent unsigned integers, but
the same algorithm cannot be applied directly to values that represent
signed values (this is different from addition and subtraction where the same
algorithms can be applied to values that represent unsigned or signed
values!).
Overflow is not an issue in n-bit unsigned multiplication, proving that 2n-bits
of results are kept.
Now consider the multiplication of two unsigned 4-bit values a and b. The
value b can be rewritten in terms of its individual digits:
b = b3 23 + b2 22 + b1 21+ b0 20
Substituting this into the product a b gives:
a (b3 23 + b2 22 + b1 21+ b0 20 )
Which can be expanded into:
a b3 23 + a b2 22 + a b1 21+ a b0 20
b a
0
c
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 193
The possible values of bi are 0 or 1. In the expression above, any term
where bi = 0 resolves to 0 and the term can be eliminated. Furthermore, in
any term where bi = 1, the digit bi is redundant (multiplying by 1 gives the
same value, and therefore the digit bi can be eliminated from the term. The
resulting expression can be written and generalized to n-bits:
1n
0i
b*a a 2i where bi = 1
This expression may look a bit intimidating, but it turns out to be reasonably
simple to implement in a computer because it only involves multiplying by 2
(and there is a trick that lets computers do this easily!). Multiplying a value
by 2 in the binary number system is analogous to multiplying a value by 10
in the decimal number system. The result has one new digit: a 0 is injected
as the new least significant digit, and all of the original digits are shifted to
the left as the new digit is injected.
Think in terms of a decimal example, say: 37 10 = 370. The original value
is 37 and the result is 370. The result has one more digit than the original
value, and the new digit is a 0 that has been injected as the least significant
digit. The original digits (37) have been shifted one digit to the left to admit
the new 0 as the least significant digit.
The same rule holds good for multiplying by 2 in the binary number system.
For example:
1012 2 = 10102. The original value of 5 (1012) is multiplied by 2 to give 10
(10102). The result can be obtained by shifting the original value left one
digit and injecting a 0 as the new least significant digit.
The calculation of a product can be reduced to summing terms of the form
a 2i. The multiplication by 2i can be reduced to shifting left i times! The
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 194
shifting of binary values in a computer is very easy to do, and as a result,
the calculation can be reduced to a series of shifts and adds.
Unsigned Integer Multiplication: Straightforward Method
To compute: a * b
Where
Register A contains a = an-1an-2...a1a0
Register B contains b = bn-1bn-2...b1b0
and where register P is a register twice as large as A or B
Simple Multiplication Algorithm: Straightforward Method
Steps:
1. If LSB(A) = 1, then set Pupper to Pupper + b
2. Shift the register A right
3. Using zero sign extension (for unsigned values)
4. Forcing the LSB(A) to fall off the lower end
5. Shift the double register P right
6. Using zero sign extension (for unsigned values)
7. Pushing the LSB(Pupper) into the MSB(Plower)
After n times (for n-bit values),
The full contents of the double register P = a * b
Example of Simple Multiplication Algorithm (Straightforward Method)
Multiply b = 2 = 00102 by a = 3 = 00112
(Answer should be 6 = 01102)
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 195
P A B Comments
0000 0000 0011 0010 Start: A = 0011; B = 0010; P = (0000, 0000)
0010 0000 0011 0010 LSB(A) = l ==> Add b to P
0010 0000 0001 0010 Shift A right
0001 0000 0001 0010 Shift P right
0011 0000 0001 0010 LSB(A) = l ==> Add b to P
0011 0000 0000 0010 Shift A right
0001 1000 0000 0010 Shift P right
0001 1000 0000 0010 LSB(A) = 0 ==> Do nothing
0001 1000 0000 0010 Shift A right
0000 1100 0000 0010 Shift P right
0000 1100 0000 0010 LSB(A) = 0 ==> Do nothing
0000 1100 0000 0010 Shift A right
0000 0110 0000 0010 Shift P right
0000 0110 0000 0010 Done: P is product
Unsigned Integer Multiplication: A More Efficient Method
To compute: a * b
where
Register A contains a = an-1an-2...a1a0
Register B contains b = bn-1bn-2...b1b0
and where register P is connected to register A to form a register twice
as large as A or B and register P is the upper part of the double register
(P, A)
Simple Multiplication Algorithm: A More Efficient Method
Steps:
1. If LSB(A) = 1, then set P to Pupper + b
2. Shift the double register (P, A) right
3. Using zero sign extension (for unsigned values)
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 196
4. Forcing the LSB(A) to fall off the lower end
5. Pushing the LSB(Pupper) into MSB(Plower) = MSB(A)
After n times (for n-bit values),
The contents of the double register (P, A) = a * b
Example of Simple Multiplication Algorithm (A More Efficient Method)
Multiply b = 2 = 00102 by a = 3 = 00112
(Answer should be 6 = 01102)
P A B Comments
0000 0011 0010 Start: A = 0011; B = 0010; P = (0000, 0000)
0010 0011 0010 LSB(A) = l ==> Add b to P
0001 0001 0010 Shift (P, A) right
0011 0001 0010 LSB(A) = l ==> Add b to P
0001 1000 0010 Shift (P, A) right
0001 1000 0010 LSB(A) = 0 ==> Do nothing
0000 1100 0010 Shift (P, A) right
0000 1100 0010 LSB(A) = 0 ==> Do nothing
0000 0110 0010 Shift (P, A) right
0000 0110 0010 Done: P is product
By combining registers P and A, we can eliminate one extra shift step per
iteration.
Positive Integer Multiplication
Shift-Add Multiplication Algorithm
(Same as Straightforward Method above)
For unsigned or positive operands
Repeat the following steps n times:
1. If LSB(A) = 1, then set Pupper to Pupper + b else set Pupper to Pupper + 0
2. Shift the double register (P, A) right
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 197
3. Using zero sign extension (for positive values)
4. Forcing the LSB(A) to fall off the lower end
Signed Integer Multiplication
Simplest:
1. Convert negative values of a or b to positive values
2. Multiply both positive values using one of the two algorithms above
3. Adjust sign of product appropriately Alternate: Booth algorithm
Introduction to Booth's Multiplication Algorithm
A powerful algorithm for signed-number multiplication is the booth algorithm.
It generates a 2n-bit product and treats both positive and negative numbers
uniformly.
Consider a positive binary number containing a run of ones e.g., the 8-bit
value: 00011110. Multiplying by such a value implies four consecutive
additions of shifted multiplicands
00010100 20
x 00011110 x 30
00000000 00
00010100 first 60
00010100 second
00010100 third
00010100 fourth
00000000
00000000
00000000
0000001001011000 600
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 198
Now 00011110 can be expressed as a difference of powers of two:
00100000 32
– 00000010 – 2
00011110 30
This means that the same multiplication can be obtained using only two
additions:
+ 25 x multiplicand of 00010100
- 21 x multiplicand of 00010100
Since the 1s complement of 00010100 is 11101011,
Then -00010100 is 11101100 in 2s complement
In other words, using sign extension and ignoring the overflow,
00010100 00010100
x 00011110 x 00011110
00000000 00000000
– 00010100 – 2 x 111111111101100
00000000 00000000
00000000 00000000
00000000 00000000
+ 00010100 + 32 x 00010100
00000000 00000000
00000000 00000000
0000001001011000 1|0000001001011000
Booth's Recoding Table
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 199
Booth recodes the bits of the multiplier a according to the following table:
ai ai-1 Recoded ai
0 0 0
0 1 +1
1 0 -1
1 1 0
Always assume that there is a zero to the right of the multiplier
i.e. that a-1 is zero
so that we can consider the LSB (= a0)
and an imaginary zero bit to its right
So, if a is 00011110 ,
the recoded value of a is
0 0 +1 0 0 0 -1 0
Example of Booth's Multiplication
A. Multiply 00101101 (b) by 00011110 (a) using normal multiplication
00101101 45
x 00011110 x 30
00000000 00
00101101 first 135
00101101 second
00101101 third
00101101 fourth
00000000
00000000
00000000
0000010101000110 1350
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 200
B. Multiply 00101101 (b) by 00011110 (a) using Booth's multiplication
Recall the recoded value of 00011110 (a) is 0 0 +1 0 0 0 -1 0
The 1s complement of 00101101 (b) is 11010010
The 2s complement of 00101101 (b) is 11010011
Hence, using sign extension and ignoring overflow:
00101101 45
x 00100010 x 30
00000000 00
111111111010011 - b 135
00000000
00000000
00000000
00101101 + b
00000000
00000000
1|0000010101000110 1350
Booth's Multiplication Algorithm
Booth's algorithm chooses between + b and - b depending on the two
current bits of a .
Algorithm:
1. Assume the existence of bit a-1 = 0 initially
2. Repeat n times (for multiplying two n-bit values):
a. At step i:
a. If ai = 0 and ai-1 = 0, add 0 to register P
b. If ai = 0 and ai-1 = 1, add b to register P
i.e. treat ai as +1
c. If ai = 1 and ai-1 = 0, subtract b from register P
i.e. treat ai as -1
d. If ai = 1 and ai-1 = 1, add 0 to register P
b. Shift the double register (P, A) right one bit with sign extension
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 201
Justification of Booth's Multiplication Algorithm
Consider the operations described above:
ai ai-1 ai-1 - ai Operation
0 0 0 Add 0xb to P
0 1 +1 Add +1xb to P
1 0 -1 Add -1xb to P
1 1 0 Add 0xb to P
Equivalently, the algorithm could state
2. Repeat n times (for multiplying two n-bit values): At step i, add (ai-1 - ai) x
b to register P
The result of all n steps is the sum:
(a-1 – a0) x 20 x b
+ (a0 – a1) x 21 x b
+ (a1 - a2) x 22 x b ...
+ (an-3 - an-2) x 2 n-2 x b
+ (an-2 - an-1) x 2 n-1 x b
where a-1 is assumed to be 0
This sum is equal to b x SUMi=0n-1 ((ai-1 - ai) x 2 i)
= b( -2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0) + ba-1
= b( -2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0)
since a-1 = 0
Now consider the representation of a as a 2s complement
It can be shown to be the same as
-2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0
where an-1 represents the sign of a
If an-1 = 0, a is a positive number
If an-1 = 1, a is a negative number
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 202
Example:
Let w = 10112
Then w = -1x23 + 0x22 + 1x21 + 1x20 = -8 + 2 + 1 = -5
Thus the sum above,
b( -2 n-1an-1 + 2 n-2an-2 + ... + 2a1 + a0) ,
is the same as the 2s complement representation of b x a
Example of Booth's Multiplication Algorithm (Positive Numbers)
Multiply b = 2 = 00102 by a = 6 = 01102
(Answer should be 12 = 11002)
The 2s complement of b is 1110
P A B Comments
0000 0110 [0] 0010 Start: a-1 = 0
0000 0110 [0] 0010 00 ==> Add 0 to P
0000 0011 [0] 0010 Shift right
1110 0011 [0] 0010 10 ==> Subtract b from P
1111 0001 [1] 0010 Shift right
1111 0001 [1] 0010 11 ==> Add 0 to P
1111 1000 [1] 0010 Shift right
0001 1000 [1] 0010 01 ==> Add b to P
0000 1100 [0] 0010 Shift right
0000 1100 [0] 0010 Done: (P, A) is product
Example of Booth's Multiplication Algorithm (Negative Numbers)
Multiply b = -5 = -(0101)2 = 10112 by a = -6 = -(0110)2 = 10102
(Answer should be 30 = 111102)
The 2s complement of b is 1110
P A B Comments
0000 1010 [0] 1011 Start: a-1 = 0
0000 1010 [0] 1011 00 ==> Add 0 to P
0000 0101 [0] 1011 Shift right
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 203
0101 0101 [0] 1011 10 ==> Subtract b from P
0010 1010 [1] 1011 Shift right
1101 1010 [1] 1011 01 ==> Add b to P
1110 1101 [0] 1011 Shift right
0011 1101 [0] 1011 10 ==> Subtract b from P
0001 1110 [1] 1011 Shift right
0001 1110 [1] 1011 Done: (P, A) is product
Advantages and Disadvantages of Booth's Algorithm
Advantages:
Handles positive and negative numbers uniformly
Efficient when there are long runs of ones in the multiplier
Disadvantages:
Average speed of algorithm is about the same as with the normal
multiplication algorithm.
Worst case operates at a slower speed than the normal multiplication
algorithm.
Division
Terminology: dividend divisor = quotient & remainder
The implementation of division in a computer raises several practical issues:
For integer division there are two results: the quotient and the
remainder.
The operand sizes (number of bits) to be used in the algorithm must be
considered (i.e. the sizes of the dividend, divisor, quotient and
remainder).
Overflow is not an issue in unsigned multiplication, but is a concern with
division.
As with multiplication, there are differences in the algorithms for signed
vs. unsigned division.
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 204
Recall that multiplying two n-bit values can result in a 2n-bit value. Division
algorithms are often designed to be symmetrical with this by specifying:
the dividend as a 2n-bit value
the divisor, quotient and remainder as n-bit values
Once the operand sizes are set, the issue of overflow may be addressed.
For example, suppose that the above operand sizes are used, and that the
dividend value is larger than a value that can be represented in n bits (i.e.
2n – 1 < dividend). Dividing by 1 (divisor = 1) should result with quotient =
dividend; however, the quotient is limited to n bits, and therefore is
incapable of holding the correct result. In this case, overflow would occur.
Sign Extension / Zero Extension
When loading a 16-bit value into a 32-bit register,
How is the sign retained?
By loading the 16-bit value into the lower 16 bits of the 32-bit register and
By duplicating the MSB of the 16-bit value throughout the upper 16 bits of
the 32-bit register
This is called sign extension
If instead, the upper bits are always set to zero,
It is called zero extension
Unsigned Integer Division
To compute: a / b
where
Register A contains a = an-1an-2...a1a0
Register B contains b = bn-1bn-2...b1b0
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 205
and where register P is connected to register A to form a register twice as
large as A or B and register P is the upper part of the double register as
done in many of the Multiplication methods.
Simple Unsigned Division
– for unsigned operands
Steps:
1. Shift the double register (P, A) one bit left
2. Using zero sign extension (for unsigned values)
3. And forcing the MSB(P) to fall off the upper end
4. Subtract b from P
5. If result is negative, then set LSB(A) to 0 else set LSB(A) to 1
6. If result is negative, set P to P + b
After repeating these steps n times (for n-bit values), the contents of register
A = a / b, and the contents of register P = remainder (a / b)
This is also called restoring division
Restoring Division Example
Divide 14 = 11102 by 3 = 00112
P A B Comments
+00000 1110 0011 Start
+00001 1100 0011 Shift left
-00010 1100 0011 Subtract b
+00001 1100 0011 Restore
+00001 1100 0011 Set LSB(A) to 0
+00011 1000 0011 Shift left
+00000 1000 0011 Subtract b
+00000 1001 0011 Set LSB(A) to 1
+00001 0010 0011 Shift left
-00010 0010 0011 Subtract b
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 206
+00001 0010 0011 Restore
+00001 0010 0011 Set LSB(A) to 0
+00010 0100 0011 Shift left
-00001 0100 0011 Subtract b
+00010 0100 0011 Restore
+00010 0100 0011 Set LSB(A) to 0
+00010 0100 0011 Done
Restoring versus Non-restoring Division
Let r = the contents of (P, A)
At each step, the restoring algorithm computes
P = (2r - b)
If (2r - b) < 0,
then
(P, A) = 2r (restored)
(P, A) = 4r (shifted left)
(P, A) = 4r - b (subtract b for next step)
If (2r - b) < 0
and there is no restoring,
then
(P, A) = 2r - b (restored)
(P, A) = 4r - 2b (shifted left)
(P, A) = 4r - b (add b for next step)
Non-restoring Division Algorithm
Steps:
1. If P is negative,
a) Shift the double register (P, A) one bit left
b) Add b to P
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 207
2. Else if P is not negative,
a) Shift the double register (P, A) one bit left
b) Subtract b from P
3. If P is negative, then set LSB(A) to 0 else set LSB(A) to 1
After repeating these steps n times (for n-bit values),
if P is negative, do a final restore
i.e. add b to P
Then
the contents of register A = a / b, and
the contents of register P = remainder (a / b)
Non-restoring Division Example
Divide 14 = 11102 by 3 = 00112
P A B Comments
00000 1110 0011 Start
00001 1100 0011 Shift left
11110 1100 0011 Subtract b
11110 1100 0011 P negative; Set LSB(A) to 0
11101 1000 0011 Shift left
00000 1000 0011 P negative; Add b
00000 1001 0011 P positive; Set LSB(A) to 1
00001 0010 0011 Shift left
11110 0010 0011 Subtract b
11110 0010 0011 P negative; Set LSB(A) to 0
11100 0100 0011 Shift left
11111 0100 0011 P negative; Add b
11111 0100 0011 P negative; Set LSB(A) to 0
00010 0100 0011 P=remainder negative; Need final restore
00010 0100 0011 Done
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 208
Division by Multiplication
There are numerous algorithms for division many of which do not relate as
well to the division methods learned in elementary school One of these is
Division by Multiplication It is also known as Goldschmidt's algorithm.
If we wish to divide N by D, then we are looking for Q such that Q = N/D
Since N/D is a fraction, we can multiply both the numerator and denominator
by the same value, x, without changing the value of Q. This is the basic idea
of the algorithm. We wish to find some value x such that Dx becomes close
to 1 so that we only need to compute Nx To find such an x, first scale D by
shifting it right or left so that 1/2 <= D < 1
Let s = #shifts be positive if to the left
and negative, otherwise
Call this D' (= D · 2s)
Now compute x such that x = 1 - D'
Call this Z
Notice that 0 < Z <= 1/2
since Z = 1 - D'
Furthermore D' = 1 - Z
Then
Q = N / D
= N (1+Z) / D (1+Z)
= 2s N (1+Z) / 2s D (1+Z)
= 2s N (1+Z) / D' (1+Z)
= 2s N (1+Z) / (1-Z) · (1+Z)
= 2s N (1+Z) / (1-Z2)
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 209
Similarly,
Q = N / D
= 2s N (1+Z) / (1-Z2)
= 2s N (1+Z) (1+Z2) / (1-Z2) (1+Z2)
= 2s N (1+Z) (1+Z2) / (1-Z4)
And,
Q = N / D
= 2s N (1+Z) (1+Z2) / (1-Z4)
= 2s N (1+Z) (1+Z2) (1+Z4) / (1-Z4) (1+Z4)
= 2s N (1+Z) (1+Z2) (1+Z4) / (1-Z8)
continuing to
Q = N / D
= 2s N (1+Z) (1+Z2) (1+Z4) · · · (1+Z2n-1) / (1-Z2n)
Since 0 < Z <= 1/2,
Zi goes to zero as i goes to infinity. This means that the denominator, 1-Z2n,
goes to 1 as n gets larger. Since the denominator goes to 1, we need only
compute the numerator in order to determine (or approximate) the quotient,
Q
So, Q = 2s N (1+Z) (1+Z2) (1+Z4) · · · (1+Z2n-1)
Examples of Division by Multiplication
Example 1:
Let's start with a simple example in the decimal system
Suppose N = 1.8 and D = 0.9
(Clearly, the answer is 2)
Since D = 0.9, it does not need to be shifted
so the number of shifts, s, is zero
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 210
And since we are in the decimal system,
we use 10s in the equation for Q
Furthermore, Z = 1 - D = 1 - 0.9 = 0.1
For n = 0,
Q = N / D
= 1.8 / 0.9
= 100 · 1.8
= 1.8
For n = 1,
Q = N / D
= 1.8 / 0.9
= 100 · 1.8 (1 + Z)
= 1.8 (1 + 0.1)
= 1.8 · 1.1
= 1.98
For n = 2,
Q = N / D
= 1.8 / 0.9
= 100 · 1.8 (1 + Z) (1 + Z2)
= 1.8 (1 + 0.1) (1 + 0.01)
= 1.8 · 1.1 · 1.01
= 1.98 · 1.01
= 1.9998
For n = 3,
Q = N / D
= 1.8 / 0.9
= 100 · 1.8 (1 + Z) (1 + Z2) (1 + Z4)
= 1.8 (1 + 0.1) (1 + 0.01) (1 + 0.0001)
= 1.8 · 1.1 · 1.01 · 1.0001
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 211
= 1.98 · 1.01 · 1.0001
= 1.9998 · 1.0001
= 1.99999998
And so on, getting closer and closer to 2.0 with each increase of n
8.3 Floating Point Numbers
Instead of using the obvious representation of rational numbers presented in
the previous section, most computers use a different representation of a
subset of the rational numbers. We call these numbers floating-point
numbers.
Floating-point numbers use inexact arithmetic, and in return require only a
fixed-size representation. For many computations (so-called scientific
computations, as if other computations weren't scientific) such a
representation has the great advantage that it is fast, while at the same time
usually giving adequate precision.
There are some (sometimes spectacular) exceptions to the “adequate
precision” statement in the previous paragraph, though. As a result, an
entire discipline of applied mathematics, called numerical analysis, has been
created for the purpose of analyzing how algorithms behave with respect to
maintaining adequate precision, and of inventing new algorithms with better
properties in this respect.
The basic idea behind floating-point numbers is to represent a number as
mantissa and an exponent, each with a fixed number of bits of precision. If
we denote the mantissa with m and the exponent with e, then the number
thus represented is m * 2e.
Again, we have a problem that a number can have several representations.
To obtain a canonical form, we simply add a rule that m must be greater
than or equal to 1/2 and strictly less than 1. If we write such a mantissa in
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 212
binal (analogous to decimal) form, we always get a number that starts with
0.1. This initial information therefore does not have to be represented, and
we represent only the remaining “binals”.
The reason floating-point representations work well for so-called scientific
applications are that we more often need to multiply or divide two numbers.
Multiplication of two floating-point numbers is easy to obtain. It suffices to
multiply the mantissas and add the exponents. The resulting mantissa might
be smaller than 1/2, in fact, it can be as small as 1/4. In this case, the result
needs to be canonicalized. We do this by shifting the mantissa left by one
position and subtracting one from the exponent. Division is only slightly
more complicated. Notice that the imprecision in the result of a multiplication
or a division is only due to the imprecision in the original operands. No
additional imprecision is introduced by the operation itself (except possibly 1
unit in the least significant digit). Floating-point addition and subtraction do
not have this property.
To add two floating-point numbers, the one with the smallest exponent must
first have its mantissa shifted right by n steps, where n is the difference of
the exponents. If n is greater than the number of bits in the representation of
the mantissa, the second number will be treated as 0 as far as addition is
concerned. The situation is even worse for subtraction (or addition of one
positive and one negative number). If the numbers have roughly the same
absolute value, the result of the operation is roughly zero, and the resulting
representation may have no correct significant digits.
The two's complement representation that we have mentioned above is
mostly useful for addition and subtraction. It only complicates things for
multiplication and division. For multiplication and division, it is better to use a
representation with sign + absolute value. Since multiplication and division
is more common with floating-point numbers, and since they result in
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 213
multiplication and division of the mantissa, it is more advantageous to have
the mantissa represented as sign + absolute value. The exponents are
added, so it is more common to use two’s complement (or some related
representation) for the exponent.
Usually, computers manipulate data in chunks of 8, 16, 32, 64, or 128 bits. It
is therefore useful to fit a single floating-point number with both mantissa
and exponent in such a chunk. In such a chunk, we need to have room for
the sign (1 bit), the mantissa, and the exponent. While there are many
different ways of dividing the remaining bits between the mantissa and the
exponent, in practice most computers now use a norm called IEEE, which
mandates the formats as shown in figure 8.1
Figure 8.1: Formats of floating point numbers
Floating Point Variables
Floating point variables have been represented in many different ways
inside computers of the past. But there is now a well adhered to standard for
the representation of floating point variables. The standard is known as the
IEEE Floating Point Standard (FPS). Like scientific notation, FPS represents
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 214
numbers with multiple parts, a sign bit, one part specifying the mantissa and
a part representing the exponent. The mantissa is represented as a signed
magnitude integer (i.e., not 2's Compliment), where the value is normalized.
The exponent is represented as an unsigned integer which is biased to
accommodate negative numbers. An 8-bit unsigned value would normally
have a range of 0 to 255, but 127 is added to the exponent, giving it a range
of -126 to +127.
Follow these steps to convert a number to FPS format.
1. First convert the number to binary.
2. Normalize the number so that there is one nonzero digit to the left of the
binary place, adjusting the exponent as necessary.
3. The digits to the right of the binary point are then stored as the mantissa
starting with the most significant bits of the mantissa field. Because all
numbers are normalized, there is no need to store the leading 1.
Note: Because the leading 1 is dropped, it is no longer proper to refer to
the stored value as the mantissa. In IEEE terms, this mantissa minus its
leading digit is called the significant.
4. Add 127 to the exponent and convert the resulting sum to binary for the
stored exponent value. For double precision, add 1023 to the exponent.
Be sure to include all 8 or 11 bits of the exponent.
5. The sign bit is a one for negative numbers and a zero for positive
numbers.
6. Compilers often express FPS numbers in hexadecimal, so a quick
conversion to hexadecimal might be desired.
Here are some examples using single precision FPS.
3.5 = 11.1 (binary)
= 1.11 x 2^1 sign = 0, significant = 1100...,
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 215
exponent = 1 + 127 = 128 = 10000000
FPS number (3.5) = 0100000001100000...
= 0 x 40600000
100 = 1100100 (binary)
= 1.100100 x 2^6 sign = 0, significant = 100100...,
exponent = 6 + 127 = 133 = 10000101
FPS number (100) = 010000101100100...
= 0 x 42c80000
What decimal number is represented in FPS as 0 x c2508000?
Here we just reverse the steps.
0xc2508000 = 11000010010100001000000000000000 (binary)
sign = 1; exponent = 10000100; significant =
10100001000000000000000
exponent = 132 ==> 132 - 127 = 5
-1.10100001 x 2^5 = -110100.001 = -52.125
Floating Point Arithmetic
Until fairly recently, floating point arithmetic was performed using complex
algorithms with a integer arithmetic ALU. The main ALU in CPUs is still
integer arithmetic ALU. However, in the mid-1980s, special hardware was
developed to perform floating point arithmetic. Intel, for example, sold a chip
known as the 80387 which was a math co-processor to go along with the
80386 CPU. Most people did not buy the 80387 because of the cost. A
major selling point of the 80486 was that the math co-processor was
integrated onto the CPU which eliminated the purchase of a separate chip to
get faster floating point arithmetic.
Floating point hardware usually has a special set of registers and
instructions for performing floating point arithmetic. There are also special
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 216
instructions for moving data between memory or the normal registers and
the floating point registers.
Addition of Floating-Point Numbers
The steps (or stages) of a floating-point addition:
1. The exponents of the two floating-point numbers to be added are
compared to find the number with the smallest magnitude.
2. The significant of the number with the smaller magnitude is shifted so
that the exponents of the two numbers agree.
3. The significants are added.
4. The result of the addition is normalized.
5. Checks are made to see if any floating-point exceptions occurred during
the addition, such as overflow.
6. Rounding occurs.
Floating-Point Addition Example
Example: s = x + y
numbers to be added are x = 1234.00 and y = -567.8
these are represented in decimal notation with a mantissa (significand)
of four digits
six stages (A - F) are required to complete the addition
Step A B C D E F
X 0.1234E4 0.12340E4
Y -0.05678E3 -0.05678E4
S 0.066620E4 0.6662E3 0.6662E3 0.6662E3
(For this example, we are throwing out biased exponents and the assumed
1.0 before the magnitude. Also all numbers are in the decimal number
system and no complements are used.)
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 217
Time for Floating-Point Addition
Consider a set of floating-point additions sequentially following one another
(as in adding the elements of two arrays)
Assume that each stage of the addition takes t time units;
Time: t 2t 3t 4t 5t 6t 7t 8t
Step
A x1 + y1 x2 + y2
B x1 + y1 x2 + y2
C x1 + y1
D x1 + y1
E x1 + y1
F x1 + y1
Each floating-point addition takes 6t time units
Pipelined Floating-Point Addition
With the proper architectural design, the floating-point addition stages can
be overlapped
Time: t 2t 3t 4t 5t 6t 7t 8t
Step
A x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5 x6 + y6 x7 + y7 x8 + y8
B x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5 x6 + y6 x7 + y7
C x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5 x6 + y6
D x1 + y1 x2 + y2 x3 + y3 x4 + y4 x5 + y5
E x1 + y1 x2 + y2 x3 + y3 x4 + y4
F x1 + y1 x2 + y2 x3 + y3
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 218
This is called pipelined floating-point addition. Once the pipeline is full and
has produced the first result in 6t time units, it only takes 1t time units to
produce each succeeding sum.
8.4 Real Numbers
One sometimes hears variations on the phrase “computers can't represent
real numbers exactly”. This, of course is not true. Nothing prevents us from
representing (say) the square-root of two as the number two and a bit
indicating that the value is the square root of the representation. Some
useful operations could be very fast this way. It is true though; that we
cannot represent all real numbers exactly. In fact, we have a similar problem
as that we have with rational numbers, in that it is hard to pick a useful
subset that we can represent exactly, other than the floating-point numbers.
For this reason, no widespread hardware contains built-in real numbers
other than the usual approximations in the form of floating-point.
8.5 Summary
In this unit, we have dealt with various core operations to be performed in
arithmetic viz. Addition, Subtraction, Multiplication and Division. We have
discussed the various techniques for fixed point unsigned and signed
numbers arithmetic. We have also discussed Booth algorithm for
multiplication of binary numbers. A separate section is dedicated to the
Floating point standard prevalent today that includes the IEEE standard
also. We have also seen the addition operation for floating point numbers,
laying stress on the time constraint and pipelining concepts.
Computer Organization and Architecture Unit 8
Sikkim Manipal University - DDE Page No. 219
Self Assessment Questions
1. Using 4-bit fixed length binary, the addition of 4 and 14 results in
__________.
2. The two’s complement of -5 is __________.
3. Floating-point numbers is to represent a number __________.
4. IEEE FPS represents numbers with __________.
5. Floating point hardware usually has a special set of __________ and
__________ for performing floating point arithmetic.
8.6 Terminal Questions
1. Explain the addition of a two floating point numbers with examples.
2. Discuss the different formats of floating point numbers.
3. Compute the product of 7 by 2 using booth’s algorithm.
8.7 Answers
Self Assessment Questions
1. overflow
2. (1011)
3. as mantissa and an exponent
4. a sign bit, the mantissa and the exponent
5. registers, instructions
Terminal Questions
1. Refer Section 8.2
2. Refer Section 8.3
3. Refer Section 8.2
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 220
Unit 9 Memory Unit – Part I
Structure:
9.1 Introduction
Objectives
9.2 Characteristics of Memory Systems
9.3 Main Memory
Types of Random-Access Semiconductor Memory
Organization
Static and dynamic memories
9.4 Memory system considerations
Design of memory subsystem using Static Memory Chips
Design of memory subsystem using Dynamic Memory Chips
9.5 Memory interleaving
9.6 Summary
9.7 Terminal Questions
9.8 Answers
9.1 Introduction
Memory unit is used for storage, and retrieval of data and instructions. A
typical computer system is equipped with a hierarchy of memory
subsystems, some internal to the system and some external. Internal
memory systems are accessible by the CPU directly and external memory
systems are accessible by the CPU by an I/O module.
Objectives:
By the end of Unit 9, you should be able to:
1. Define the different units of transfer of data.
2. Explain the various accessing methods
3. Explain with neat sketches various memory organization.
4. Discuss the design of memory subsystem using dynamic memory chips.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 221
9.2 Characteristics of Memory Systems
Memory systems are classified according to their key characteristics. The
most important are listed below:
Location
The classification of memory is done according to the location of the
memory as:
CPU: The CPU requires its own local memory in the form of registers
and also the control unit requires local memories which are fast
accessible. We have already studied this in detail in our earlier
discussions.
Internal (main): It is often equated with the main memory. There are
other forms of internal memory. We will be discussing the internal
memory in the next coming sections of this unit.
External (secondary): It consists of peripheral storage devices like hard
disks, magnetic disks, magnetic tapes, CDs etc.
Capacity
Capacity is one of the important aspects of the memory.
Word size: Word size is the natural unit of organization of memory. The
size of the word is typically equal to the number of bits used to represent a
number and is equal to the instruction length. But there are many
exceptions. Common word lengths are 8, 16 and 32 bits.
Number of words: The addressable unit is the word in many systems.
However external memory capacity is generally expressed in terms of bytes.
Unit of Transfer
Unit of transfer for internal memory is equal to the number of data lines into
and out of memory module.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 222
Word: For internal memory, unit of transfer is equal to the number of
data lines into and out of the memory module. Need not be equal to a
word or addressable unit.
Block: For external memory, data are often transferred in much larger
units than a word, and these are referred to as blocks.
Access Method
Sequential: Tape units have sequential access. Data are generally
stored in units called "records". Data is accessed sequentially; the
records may be passed (or rejected) until the record that is searched for
is found. The access time to a certain record is highly variable.
Direct: Individual blocks or records have a unique address based on
physical location. A block may contain a group of data. Access is
accomplished by direct address to reach general vicinity, plus sequential
searching, counting or waiting to reach the final location. Disk units
have direct access.
Random: Each addressable location in memory has a unique,
physically wired-in addressing mechanism. The time to access a given
location is independent of the sequence of prior accesses and constant.
Any location can be selected at random and directly addressed and
accessed. Main memory and some cache systems are random access.
Associative: This is a random-access type of memory that enables one
to make a comparison of desired bit locations within a word for a
specified match, and to do this for all words simultaneously. Thus, a
word is retrieved based on a portion of its contents rather than its
address. Some cache memories may employ associative access.
Performance
Access time: For random-access memory, this is the time it takes to
perform a read or write operation. That is, the time from the instant that
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 223
an address is presented to the memory to the instant that data have
been stored or made available for use. For non-random-access memory,
access time is the time it takes to position the read-write mechanism at
the desired location.
Cycle time: Applied to random-access memory. It consists of the
access time plus any additional time required before a second access
can commence.
Transfer rate: This is the rate at which data can be transferred into or
out of a memory unit. For random-access memory, it is equal to
(1/<cycle-time>).
For non-random-access memory, the following relationship holds:
Tn = Ta + N/R
where Tn = Average time to read or write N bits;
Ta = Average access time,
N = Number of bits
R = Transfer rate, in bits per second (bps).
Physical Type
Semiconductor: Main memory, cache. RAM, ROM.
Magnetic: Magnetic disks (hard disks), magnetic tape units.
Optical: CD-ROM, CD-RW.
Magneto-Optical: The recording technology is fundamentally magnetic.
However an optical laser is used. The read operation is purely optical.
Physical Characteristics
Volatile/Non-volatile: In a volatile memory, information decays naturally
or is lost when electrical power is switched off. In a non-volatile memory,
information once recorded remains without deterioration until
deliberately changed; no electrical power is needed to retain information.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 224
Magnetic-surface memories are nonvolatile. Semiconductor memories
may be either volatile or non-volatile.
Erasable/Non-erasable: Non-erasable memory cannot be altered
(except by destroying the storage unit). ROMs are non-erasable.
Memory Hierarchy
Design constraints: How much? How fast? How expensive?
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit,
Greater capacity, slower access time.
9.3 Main Memory
The main memory stores data and instructions. Main memories are usually
built from dynamic IC‟s known as dynamic RAMs. These semiconductor ICs
can also implement static memories referred to as static RAMs (SRAMs).
SRAMs are faster but cost per bit is higher. These are often used to build
caches.
Types of Random-Access Semiconductor Memory:
Dynamic RAM (DRAM): Example: Charge in capacitor. It requires periodic
refreshing.
Static RAM (SRAM): Example: Flip-flop logic-gates. Applying power is
enough (no need for refreshing). Dynamic RAM is simpler and hence
smaller than the static RAM. Therefore they are denser and less expensive.
But it requires supporting refresh circuitry. Static RAMs are faster than
dynamic RAMs.
ROM: The data is actually wired in the factory. It can never be altered.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 225
PROM: Programmable ROM. It can only be programmed once after its
fabrication. It requires special device to program.
EPROM: Erasable Programmable ROM. It can be programmed multiple
times. Whole capacity need to be erased by ultraviolet radiation before a
new programming activity. It cannot be partially programmed.
EEPROM: Electrically Erasable Programmable ROM. Erased and
programmed electrically. It can be partially programmed. Write operation
takes considerably longer time compared to read operation.
Each more functional ROM is more expensive to build, and has smaller
capacity than less functional ROM's.
Organization
Basic element of semiconductor memory is the memory cell. All
semiconductor memory cells have certain properties:
Have two stable states that represent binary 0 and 1.
Capable of being written into (at least once), to set the state.
Capable of being read to sense the state.
Individual cells can be selected for reading and writing operations.
The cell has three functional terminals which are shown in figure 9.1. The
select terminals selects a cell for read or write operation, the control
indicates read or write, and the third terminal is used to write into the cell,
that is to set the state of the cell to 0 or 1. Similarly for read operation the
third terminal is used to output the state of the cell.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 226
Figure 9.1: Memory cell organization (a) : write (b) for read operation
Chip Logic
Semiconductor memory comes in package chips. Each chip contains an
array of memory cells. Two organizational approaches have been used 2D
and 2½D.
Figure 9.2: 2D memory organizations
select
control
select
Cell Data in
Cell
control
sense
a)
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 227
A) 2D memory organization
In 2D memory organization the physical arrangements of a memory chip
(memory cells in an array) is same as logical words in memory. Figure 9.2
shows the 2D memory organization. A memory chip has W words of B bits
each. Hence its capacity is (W x B) bits.
E.g.: A 16-Mbit memory chip can be organized as 1M 16-bit words or as 4M
4-bit words. The 1st memory chip will have 20 address and 16 data lines,
the 2nd memory chip will have 22 address and 4 data lines. Remember that
the lines may be multiplexed. So less address lines can be used in a
specific memory chip.
Number of Address Lines = log2 W.
The figure also depicts the additional circuitry. Address lines supply the
address of the word to be selected. A total of log2 W lines are needed are
input to the decoder and the output is to activate a single output based on
bit pattern out of W words.
Example: Address lines 0101 input to a decoder activates the 6th output line
(address starts from 0). This output is then used to select one of the word
lines. Data lines are used for input and output of B bits from/to the selected
word line chip simultaneously to/from the data buffer. The disadvantage of
2D organization is that all bits of any given word are on the same chip.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 228
A) 2½ D memory organization
Figure 9.3: 2½ D memory organization
The 2½D memory organization is as shown in figure 9.3. In 2½D memory
organization the bits of a particular word are spread across multiple chips.
The most extreme and most common organization is to allow only 1 bit of a
given word on a chip. Now the array itself is like a matrix, each cell is
connected to a row line and a column line. For any operation to select a bit
of a particular word, the word address is split into two. One part is fed to the
decoder to select one row and second part of address is fed to another
decoder to select one of the columns.
From the system standpoint, the memory unit can be viewed as a “black
box”. Data transfer between the main memory and the CPU register takes
place through two registers namely MAR (memory address register) and
MDR (memory data register). If MAR is k bits long and MDR is n bits long,
the main memory unit can contain up to 2k addressable locations. During a
„memory cycle‟ n bits of data are transferred between main memory and
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 229
CPU. This transfer takes place over the processor bus, which has k address
lines and n data lines.
Figure 9.4: Connection of main memory to CPU
The bus also includes control lines Read, Write, and Memory Function
Completed (MFC) for coordinating the data transfers. In the case of byte
addressable computers, another control line is added to indicate when only
a byte, rather than a full word of n bits, is transferred. Fig. 9.4 shows the
connection between the CPU and main memory. The CPU initiates a
memory operation by loading the appropriate data into registers MDR and
MAR, and setting either Read or Write memory control line to 1. When the
required operation is completed the memory control circuitry sends MFC
signal to CPU.
The time that elapses between the initiation of an operation and completion
of that operation is called memory access time. The minimum time delay
between two successive memory operations is called memory cycle time.
The cycle time is usually slightly lower than the access time. The main
memory is Random Access Memory.
Main memory Up to 2
k
addressable locations Word length = n
bits
MAR
MDR
k-bit address bus
n-bit data bus
(Read, Write, MFC etc.)
Control lines
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 230
Therefore any location can be accessed for a Read or Write operation in
some fixed amount of time. Semiconductor Integrated Circuits are used for
the implementation of main memories.
Some of the techniques used to increase the effective speed and size of the
main memory are discussed in the following sections.
Static and Dynamic Memories
Semiconductor memories in which the storage cells are small transistor
circuits are used for high speed CPU registers. Single chip RAMs can be
manufactured in sizes ranging from a few hundred bits to 1 GB or more.
Semiconductor memories fall into two categories, SRAMs (static RAMs) and
DRAMs (dynamic RAMs). Both bipolar transistor and MOS ((Metal Oxide
Semiconductor) transistor are used for designing RAMs but MOS is
dominant technology for designing large RAMs.
A semiconductor memory constructed using bipolar transistors or MOS
transistor stores information in the form of a flip-flop voltage levels. These
voltage levels are not likely to be discharged. Such memories are called
static memories. In these memories information remains constant for longer
period of time. Semiconductor memory designed using a MOS with a
capacitor stores the information in the form of charge on a capacitor. The
stored charge has a tendency to leak away. A stored „1‟ will become „0‟ if no
precautions are taken. These types of memory are called dynamic
memories.
A) Static memories
Static RAM cells use 4 – 6 transistors to store a single bit of data. This
provides faster access times at the expense of lower bit densities. A
processor's internal memory (registers and cache) is fabricated using static
RAM. SRAMs resemble the flip-flops used in the processor design. SRAM
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 231
cells differ from the flip-flops primarily in methods used to address the cells
and transfer data to and from them.
The six-transistor SRAM is shown in Fig. 9.5. A signal applied to the word
line (also called as address line) by the address decoder selects the cell
either for Read or Write operation. The two bit lines (also called as data
lines) are used to transfer stored data and its complement between the cell
and data drivers.
b +5V b‟
T3 T4
T5 T6
T1 T2
Bit lines Word line
Fig. 9.5: An n-channel MOS memory cell
Static RAM is used extensively for second level cache memory, where its
speed is needed and a relatively small memory will lead to a significant
increase in performance.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 232
B) Dynamic memories
The bulk of a modern processor's memory is composed of dynamic RAM
(DRAM) chips. A DRAM memory cell uses a single transistor and a
capacitor to store a bit of data. In a DRAM cell, the „1‟ and „0‟ states
correspond to the presence or absence of a stored charge in a capacitor
controlled by the transistor switching circuit. Since a DRAM can be
constructed by a single transistor, the storage density is higher. Since the
charge stored in DRAM may leak with time, the cell must be periodically
refreshed.
Fig. 9.6 illustrates one-transistor DRAM cell. Transistor used is a MOS
transistor which acts as a switch and a capacitor to store a data bit. To write
information into the cell, a voltage signal is applied to the data line. Voltage
signal can either be high or low representing 1 and 0 respectively. A signal
is applied to the word line to switch on T. Now the capacitor charges if the
data line is 1. When the transistor is off, the capacitor begins to discharge
due to capacitor‟s own leakage resistance and due to the fact that the
transistor continues to conduct a very small amount of current after it is
turned off. Hence the information stored in the cell can be retrieved correctly
only if it is read before the charge on the capacitor drops below some
threshold value. The memory cell is therefore refreshed every time its
contents are read. When a DRAM is being refreshed, other accesses must
be "held off". This increases the complexity of DRAM controllers.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 233
Bit line
Word line
T
C
Fig. 9.6: A single-transistor dynamic memory cell
To read the cell, the word line is activated. Now the charge stored in
capacitor is transferred to the bit line where it is detected. Almost all DRAMs
fabricated require the address applied to the device to be asserted in two
parts: a row address and a column address. This has a deleterious effect on
the access time, but enables devices with large numbers of bits to be
fabricated with fewer pins (enabling higher densities).
Let us study the internal organization of a 64K 1 dynamic memory chip.
The cells are arranged in the form of a square array. The higher order 8 bits
of the 16 bits constitute the row address of the cell and the lower order 8 bits
constitute the column address of a cell. The row and column addresses are
multiplexed on eight pins. This is done to reduce the number of pins needed
for external connections. During a Read or Write operation the row address
is applied first. It is loaded into the row address latch under the control of
Row Address Strobe (RAS) input of the chip.
Then a Read operation is initiated in which all cells in a selected row are
read and refreshed. Soon after the row address is loaded, the column
address is applied to the address pins and loaded into the column address
latch under the control of Column Address Strobe (CAS) signal. The
information is decoded in the column decoder and the appropriate
w rite / sense circuit is selected. If the W/R control signal initiates a Read
operation, output of the selected circuit is transferred to the data output 0D .
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 234
RAS
Row address
latch
Row
decoder 256 256
cell array
Sense / write
amplifiers
Column decoder
Column
address
latch
07A
1D
CS
WR /
0D CAS
Fig. 9.7: Internal organization of a 64K 1 dynamic memory chip
To perform a Write operation, the information at the data input DI is given to
the column decoder. The information is then used to overwrite the contents
of the selected cell in the corresponding column.
Application of a row address causes cells on the corresponding row to be
read and refreshed during both Read and Write operations. To ensure that
the contents of dynamic memory are maintained, each row of cells must be
addressed periodically. A refresh circuit usually performs this function. In
some dynamic memory chips the refresh facility is available within the chip
itself. In such cases dynamic nature of these chips are completely invisible
to the user. Such chips are known as pseudo static.
Advantages
1. High bit density
2. Available chips range from 1K to 4M bits, and even larger chips are
being developed.
3. Low power dissipation
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 235
Disadvantages
Slower speed of operation
9.4 Memory System Considerations
While choosing a RAM chip for a given application several factors have to
be taken into account. The most important factors are the speed, power
dissipation, and size of the chip. In some situations other features such as
availability of block transfers may be important. Because of high power
dissipation in bipolar circuits, high bit density cannot be achieved. Hence a
memory constructed using bipolar chips needs relatively large number of
chips. Bipolar memories (static memories) are used only when very fast
operation is required.
Static MOS memory chips have higher densities and slightly longer access
time than bipolar memories. They have lower densities than dynamic
memories, but are easier to use because they do not require refreshing
circuits. Since it doesn't need refresh, static RAMs consume less power than
dynamic RAMs. SRAMs will be found in battery-powered systems. The
absence of refresh circuitry leads to slightly simpler systems, so SRAM will
also be found in very small systems, where the simplicity of the circuitry
compensates for the cost of the memory devices themselves.
Dynamic MOS memory is the ruling technology used in computer main
memories. Since higher density is achieved with this technology, it is
economical to implement large memories using dynamic MOS memories.
Let us study how a memory subsystem is designed using static and
dynamic memory chips.
Design of memory subsystem using Static Memory Chips
Consider a small memory consisting of 64K words; each word is of 16 bits.
Figure 9.8 shows organization of this memory using 16K 1 static memory
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 236
chips. A single static memory chip has a control input called Chip Select
(CS). When CS is set to 1, it enables the chip to accept data input or to
place data on the output line. The data output for each chip is of three state
type. Only the selected chip places the data on the output line while all the
other outputs are in high impedance state.
There are 4 chips in each column. Each column represents one bit position.
There are 16 such columns to build a 64K 16 memory. The address bus
required for this memory is of 16 bits wide. The high-order 2 bits of the
address are decoded to obtain the 4 chip select control signals. The
remaining 14 address bits are used to access specified bit locations inside
the chip of the selected row. The R / W inputs are given to all chips which
provide common Read / Write control.
Fig. 9.8: Organization of a 64K 16 memory using static memory chips
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 237
Design of memory subsystem using Dynamic Memory Chips
Now consider a memory implemented using dynamic memory chips. The
organization of this memory is same as that of static memory. However
control circuitry of this memory differs from that of static memory in three
respects. The block diagram of a 256 16 dynamic memory chip is as
shown in figure 9.9. First, the row and column parts of the address for each
chip have to be multiplexed. Second, a refresh circuit is needed. Finally the
timing of various steps of the memory cycle must be carefully controlled.
Access
control
Refresh
control Refresh
counter
Address
Multiplexer
RAS CAS
4 16 DRAM array
CS0-3 R/W DI DO
Decoder
Timing control
0-7ADRS
16ADRS
17ADRS
writeRead/
0-15DATA
Start MFC
Refresh
request
Refresh
grant
Busy
Refresh line
columnRow/
8-15ADRS
Figure 9.9: A block diagram of a 256 16 dynamic memory chip
Block diagram consists of dynamic memory chips and a control circuitry.
Dynamic memory chips are arranged in a 4 16 array. If the individual
chips have 64K 1 organization, then the total storage capacity of array is
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 238
256K words, each word is of 16 bits. The control circuitry provides the
multiplexed address, chip select, row and column address strobe signals
(RAS and CAS) to the memory chip array. The memory unit is assumed to
be connected to an asynchronous memory bus that has 18 address lines
(ADRS17-0), 16 data lines (DATA15-0), two handshake signals (MFC and
Memory Request) and a Write / Read line to indicate the type of the
memory cycle required.
Operation of the control circuitry for a memory read cycle
1. The cycle starts when the CPU activates the address, the Write / Read
and the memory request lines.
2. When the memory request signal becomes active, the access control
block recognizes the request and it sets start signal to 1.
3. The timing and control circuit sends a busy signal in order to prevent the
access control box from accepting the new requests until the current
cycle ends.
4. The timing and control block then loads the row and column address into
the memory chips by activating RAS and CAS. First it uses
columnRow / line to select the row address (ADD15-8), followed by the
column address (ADD7-0).
5. The decoder block decodes two most significant bits of the address and
generates 4 chip select signals CS0-3.
6. After obtaining the row and column parts of the address, the selected
memory chips place the contents of the requested bit cells on their data
outputs.
7. This data is then transferred to the data lines of the memory bus through
appropriate drivers.
8. Now timing and control circuit send MFC signal to CPU indicating that
the requested data is available on the memory bus.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 239
9. Finally the busy signal is deactivated so that the access control unit is
now free to accept new requests.
The timing unit is responsible throughout the process to ensure that various
signals are activated and deactivated in accordance with the specifications
of the particular type of memory chips used.
Control sequence for refresh operation
The main purpose of the refresh circuit is to maintain the integrity of the
stored contents of the cell. A priority is given to the refresh over the CPU
bus to ensure that no information is lost, during simultaneous requests.
Sequence of refresh operation is as follows:
1. The refresh control block periodically generates refresh requests,
causing the access control block to start a memory cycle in the normal
way.
2. The access control block arbitrates between memory access requests
and refresh requests. If two requests are activated simultaneously,
refresh requests are given priority in order to ensure that no information
is lost.
3. As soon as the refresh control block receives the refresh grant signal, it
activates the refresh line.
4. The address multiplexer selects the refresh counter as the source for the
row address, instead of ADD15-8 and the contents of the counter will be
loaded into the row address latches of all the memory chips when the
RAS signal is activated.
5. During this time, the Write / Read line may indicate a write operation. It
is important to ensure that this does not inadvertently cause new
information to be loaded into some of the cells that are being refreshed.
This needed protection can be provided in several ways. One way is to
have the decoder block deactivate all CS lines to prevent the memory
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 240
chips from responding to the Write / Read line. The remainder of the
refresh cycle is then same as in a normal cycle.
6. At the end, the refresh control block increments the refresh counter in
preparation for the next refresh cycle.
Although it requires an 8-bit row address, the refresh counter needs only
seven bits wide because of the cell organization inside its memory chips. In
fact the 256 256 array in Fig. 9.9, consists of two 128 256 array each
having its own set of Write / sense circuits. One row of each of the two
arrays is accessed during any memory cycle, depending on the lower order
7 bits of the row address. The most significant bit is used only in a normal
Write / Read cycle to select one of the two groups of 256 columns.
Because of this organization, the frequency of refresh operation can be
reduced to half of what would be needed if the memory cells had been
organized in a single 256 256 array.
The main purpose of the refresh circuit is to maintain the integrity of the
stored information. Its existence should be invisible to the remainder of the
computer system. Also other parts of the system should not be affected by
the operation of the refresh circuit. If memory access and refresh requests
occur simultaneously, refresh circuit is given the priority so that no stored
information is lost. Thus the response of the memory to a request from CPU
may be delayed if refresh operation is in progress. The amount of delay
depends on the mode of refresh operation. During a refresh operation, all
memory rows may be refreshed in succession before the memory is
returned to normal use. A more common scheme, however, interleaves
refresh operations on successive rows with accesses from the memory bus,
which results in shorter, but more frequent refresh periods.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 241
In case of a synchronous, it may be possible to hide a refresh cycle within
the early part of a bus cycle if sufficient time remains after a refresh cycle to
carry out a Read or Write access.
9.5 Memory interleaving
Another technique called memory interleaving divides the system into a
number of modules and arranges them so that successive words in the
address space are placed in different modules. If memory access requests
are made for consecutive addresses, then the access will be made for
different modules. Since parallel access to these modules is possible, the
average rate of fetching words from the main memory can be increased.
M3
MA
R
MD
R
M4
MA
R
MD
R
MEMORY SWITCHING SYSTEM
4 INSTRUCTION BUFFERS IN THE PROCESSOR
M1
MA
R
MD
R
M2
MA
R
MD
R
Addre
ss
lines
Dat
a li
nes
4
word
s w
ide
Figure 9.10: Memory interleaving
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 242
Fig. 9.10 illustrates interleaved memory where main memory is divided into
4 different modules. When an instruction FETCH is issued by the processor,
a memory access circuit creates four consecutive addresses and places
them in four MARs. A memory read command reads all the four modules
simultaneously and retrieves four instructions. These are sent to the
processor. Thus each FETCH instruction fetches 4 consecutive instructions.
One may combine interleaving and cache to reduce the speed mismatch
between the cache memory and main memory. To illustrate this consider
the time required for transferring a block of data from the main memory to
the cache when a read miss occurs. Assume that a cache with 8-word
blocks is used. When cache miss occurs, the block that contains desired
word must be copied from the main memory into the cache.
Assume that the hardware has the following properties,
It takes one clock cycle to send an address to the main memory.
The memory is build with DRAM chips that allow the first word to be
accessed in 8 clock cycles, but subsequent words of the block are
accessed in 4 clock cycles per word.
One clock cycle is needed to send one word to the cache.
If a single memory is used, then the time needed to load the desired block
into the cache is 1+ 8 + (74) +1 = 38 cycles.
Similarly now assume that the main memory is divided into 4 interleaved
modules using interleaving technique. When the starting address of the
block arrives at the memory all the four modules start accessing the
required data, using higher order bits of the address. After 8 clock cycles,
each module has one word of data in its MDR. These words are transferred
to the cache, one word at a time, during the next clock cycles. During this
time the next word in each module is accessed. Then it takes another 4
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 243
cycles to transfer these words to the cache. Therefore the total time required
to load the block from the interleaved memory is 1 + 8 + 4 + 4 = 17 cycles.
Thus interleaving reduces the block transfer by more than a factor of 2.
9.6 Summary
A computer system is equipped with a hierarchy of memory subsystems.
There are several memory types with very different physical properties. The
important characteristics of memory devices are cost per bit and access
time data transfer rate, alterability and compatibility with processor
technologies.
The main memory is a major component in any computer system. Its
characteristics in terms of size and speed play an important role in
determining the performance of a given computer. There are several
techniques available to increase the effective speed and size of the
memory. An intermediate memory called cache memory can be introduced
between main memory and CPU. We have introduced the reader the
principle, structure of cache memory along with the mapping functions and
replacement algorithms. Finally we touched upon few external memory
elements and another method to increase the effective size of the memory
is to introduce virtual memory.
Self Assessment Questions
1. ____________ requires its own local memory in the form of registers.
2. ____________ is often equated with the main memory.
3. Disk units have ____________.
4. ____________ is dominant technology for designing large RAMs.
5. Static RAM cells use ____________ transistors to store a single bit of
data.
Computer Organization and Architecture Unit 9
Sikkim Manipal University - DDE Page No. 244
9.7 Terminal Questions
1. Explain the characteristics of memory system.
2. Discuss the organization of main memory.
3. Explain the concept of memory interleaving.
9.8 Answers
Self Assessment Questions:
1. CPU
2. internal
3. direct access
4. MOS
5. 4 to 6
Terminal Questions:
1. Refer Section 9.2
2. Refer Section 9.3
3. Refer Section 9.5
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 245
Unit 10 Memory Unit – Part II
Structure:
10.1 Introduction
Objectives
10.2 Cache Memory
Principles of cache memory
Structure of cache and main memory
Performance using cache memory
Elements of Cache Design
Mapping functions
Replacement algorithms
10.3 External Memory
Magnetic Disk
RAID
10.4 Virtual memory
10.5 Memory Management in Operating Systems
10.6 Summary
10.7 Terminal Questions
10.8 Answers
10.1 Introduction
Memory unit is used for storage, and retrieval of data and instructions. A
typical computer system is equipped with a hierarchy of memory
subsystems, some internal to the system and some external. Internal
memory systems are accessible by the CPU directly and external memory
systems are accessible by the CPU by an I/O module.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 246
Objectives:
By the end of Unit 10, you should be able to:
1. With neat diagram explain the cache memory. Also explain its interaction
with the processor.
2. Explain the cache read operation
10.2 Cache Memory
For all instruction cycles, the CPU accesses the memory at least once to
fetch the instruction and sometimes again accesses memory to fetch the
operands. The rate at which the CPU can execute instructions is limited by
the memory cycle time. This limitation is due to the mismatch between the
memory cycle time and processor cycle time. Ideally, the main memory
should be built with the same technology as that of CPU registers, giving
memory cycle times comparable to processor cycle times but this is a too
expensive strategy.
The solution is to exploit the principle of locality by providing a small, fast
memory between the CPU and the main memory. This memory is known as
cache memory. Thus the intention of using cache memory is to give memory
speed approaching the speed of fastest memories available, and at the
same time provide a large memory size at the price of less expensive types
of semiconductor memory.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 247
Principles of Cache Memory
CPU Cache
Main Memory
Word Transfer Block
Transfer
smaller, faster
larger, slower
Cache tag memory
(Directory)
Cache data memory
address control data
(b)
(a)
Cache
Figure 10.1 (a & b): Cache and main memory
The concept is illustrated in figure 10.1. The cache memory contains the
copy of portions of main memory. When CPU attempts to read a word from
main memory, a check is made to find if the word is in cache memory. If so
cache and then the word is delivered to the CPU. The purpose of reading a
block from main memory is that it is likely that future references will be to
other words in the block.
At any time, a subset of main memory resides in the lines of the cache.
Each line of the cache contains a tag, which identifies the location in
memory of the block it contains. Word is delivered to the CPU. If not then a
block of main memory consisting of fixed number of words is read into. This
tag is usually a portion of the main memory address. The collections of tags
which are currently assigned to the cache are stored in a special memory,
called the cache tag memory or directory as shown in fig 10.1(a). The time
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 248
required to access the cache memory is less than the time required to
access main memory.
During a Read or Write cycle the CPU initiates the memory access by
placing an address on the memory bus. This address is compared with the
tag address of the cache. If the tag address matches with address, cache hit
is said to occur. Otherwise a cache miss is said to occur. When a cache hit
occurs, the Read or Write operation is performed in cache, main memory is
not involved. In the case of a cache miss, the required data is brought from
main memory into cache. When the cache is full, the cache control
hardware must decide which block is to be removed to create space for the
new block that contains the referenced word. The collection of rules for
making this decision constitutes the replacement algorithm. The complete
cache organization is as shown in figure 10.1(c). The CPU does not know
explicitly about the existence of the cache.
Figure 10.1(c): Cache organization showing the interconnection with the
processor
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 249
When it makes Read and Write requests using the addresses that refer to
locations in the main memory, the memory access control circuitry
determines whether the requested word currently exists in the cache. The
transfers between the main memory and CPU are fast because of small
block size and fast RAM access methods.
Structure of cache and main memory
The structure of the cache and main memory is as shown in figure 10.2.
Main memory consists of n2 addressable words, each word having a
unique n bit address. This memory is considered to consist of a number of
fixed length blocks of size = K words each. That is /Kn
2M blocks. Cache
consists of C lines of K words each. The number of lines (C) of cache is less
than the number of main memory blocks (M).
C<<M
Figure 10.2: Cache and Main memory
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 250
Cache Read operation
Now let us study how cache read operation is executed. The flow chart
illustrating the steps of cache read operation is given in figure 10.3. The
processor generates the address ‘RA’ of a word to be read. If the word is
contained in the cache, it is delivered to the processor. Otherwise, the block
containing that word is loaded into the cache and simultaneously the word is
delivered to the processor from main memory. These last two operations
occur in parallel.
Figure 10.3: Cache Read Operation
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 251
In case of interleaved memory, contiguous block transfers are very efficient.
Transferring data in blocks between the main memory and the cache
enables an interleaved memory to operate at its maximum possible speed.
A cache can be introduced in two general ways. They are look-aside design
and look-through design.
A) Look-aside design
Figure 10.4 shows a look-aside design. In this case both CPU and main
memory are directly connected to the system bus. When a cache miss
occurs, it results in data transfer between cache and main memory through
the system bus, making it unavailable for other I/O operations. This
drawback can be overcome in look-through design.
Cache Block
access replacement
Main memory access
System bus
Figure 10.4: Organization of look-aside design
B) Look-through design
It is a faster, complex and costly organization when compared to the look-
aside organization. Block diagram of look through design is given in figure
10.5. The communication between CPU and cache is through a separate
bus, which is isolated from the main system bus. Hence system bus is
Cache
CPU
Main
memory
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 252
available for other units, such as I/O controllers to communicate with main
memory.
Thus the memory accesses which do not involve system bus can be
performed concurrently. In a look-through cache the local bus linking main
memory and cache is wider than the system bus. Therefore data transfer
between cache and main memory is faster. For example if system data bus
is 32 bits wide and cache block size is 128 bits, a 128 bit data bus might be
provided to link between cache and main memory.
Figure 10.5: Organization of look-through cache
The main drawback of this design is that it takes more time for main memory
to respond to the CPU when a cache miss occurs.
Performance using Cache Memory
The data is transferred from the main memory to the cache in blocks.
Typical block size is 4 to 64 words. When the cache is full, one of the
existing blocks will be evicted using standard replacement policies such as
First-in-First-out or Least Recently Used (LRU). This block transfer is carried
in anticipation that the block size is very likely to be referenced again and
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 253
again in the near future. The performance of a system that employs cache
can be analyzed as follows:
Let t c , h and tm represent the cache access time, hit ratio and the main
memory access time respectively. Then the average access time can be
determined by the equation,
) t (t h) -(1 h.t t mcc The hit ratio always lies in the closed interval 0 and 1,
and it specifies the relative number of successful references to the cache.
The above equation can be delivered using the fact that when there is a
cache hit, the main memory will not be accessed, and in the case of cache
miss, both main memory and cache will be accessed.
Suppose the ratio of main memory access time to the cache access time is ,
then an expression for the efficiency of a system that uses a cache can be
delivered as follows.
h)]γ(1[11/
γh) h - 1 (h / 1
γ]) h)(1 - (1 [h / 1
)]c/tmth)(1 - (1 [h1/
)]mt c h)(t -(1 c[htct
t /ct η Eff iciency
Thus is maximum when h=1; i.e. efficiency of the system that uses cache
is maximum when all the references are confined to the cache.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 254
Elements of Cache Design
Cache Size: Small cache => performance problems, large cache => too
expensive.
Mapping Function:
a) Direct Mapping:
Each line of cache can store specific blocks of main memory. The line
number is given as
i = j modulo m,
where i = cache line number, j = main memory block number, and
m = number of lines in the cache.
b) Associative Mapping:
Permits of loading each main memory block to any line of the cache.
c) Set associative Mapping:
It is a compromise that exhibits the strengths of both the direct and
associative approaches while reducing their disadvantages.
We will be discussing the mapping function in detail in the following section:
Replacement Algorithm (for lines)
When a new block is brought into cache, one of the existing blocks must
be replaced. The most common algorithms are:
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 255
We will be discussing the Replacement algorithms in detail in the following
section:
Write Policy:
Before a block that is resident in the cache can be replaced, it is
necessary to consider whether it has been altered in the cache but not in
the main memory.
Write through:
All write operations are both directly done to main memory and to cache.
Main memory always contains valid content.
Write back:
Writes are only done to cache. There is an UPDATE bit set when there
is a write. Before a cache block (line) is replaced, if UPDATE bit is set,
its content is written to main memory. Problem is that portions of main
memory are invalid for a certain period of time. If other devices access
those locations, they will get wrong content. Therefore access to main
memory by I/O modules can only be allowed through the cache. This
makes complex circuitry and potential bottleneck.
Write once:
Problem: If there are other cache's in the system (e.g. multi-processor
system sharing the same main memory), even in "write through", other
cache's may contain invalid content.
Line Size:
Greater line size => more hit (+) but also more line replacements (-). Too
big line size => less chance of hit for some parts of the block.
Researchers suggest that 2 to 8 words seems reasonably close to
optimum.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 256
Number of caches:
Cache, internal to processor is called Level-1 (L1) cache, external cache
is called Level-2 (L2) cache.
Mapping functions
The correspondence between the main memory and CPU are specified by a
mapping function. There are three standard mapping functions namely
1. Direct mapping
2. Associative mapping
3. Block set associative mapping
In order to discuss these methods consider a cache consisting of 128 blocks
of 16 words each. Assume that main memory is addressable by a 16 bit
address. For mapping purpose main memory is viewed as composed of 4K
blocks.
1. Direct mapping technique
This is the simplest mapping technique. In this case, block K of the main
memory maps onto block K modulo 128 of the cache. Since more than one
main memory block is mapped onto a given cache block position, contention
may arise for that position even when the cache is not full. This is overcome
by allowing the new block to overwrite the currently resident block.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 257
Figure 10.6: Direct mapping cache
A main memory address can be divided into three fields, TAG, BLOCK and
WORD as shown in figure 10.6. The TAG bit is required to identify a main
memory block when it is resident in the cache. When a new block enters the
cache the 7-bit cache block field determines the cache position in which this
block must be stored. The tag field of that block is compared with tag field of
the address. If they match, then the desired word is present in that block of
cache. If there is no match, then the block containing the required word
must be first read from the main memory and then loaded into the cache.
2. Associative mapping technique
This is a much more flexible mapping technique. Here any main memory
block can be loaded to any cache block position. Associative mapping is
illustrated as shown in figure 10.7. In this case 12 tag bits are required to
identify a main memory block when it is resident in the cache.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 258
Figure 10.7: Associative mapping cache
The tag bits of an address received from the CPU are compared with the tag
bits of each cache block to see if the desired block is present in the cache.
Here we need to search all 128 tag patterns to determine whether a given
block is in the cache. This type of search is called associative search.
Therefore the cost of implementation is higher. Because of complete
freedom in positioning, a wide range of replacement can be used.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 259
3. Block set associative mapping
Figure 10.8: Block set associative mapping cache with two blocks per set
This is a combination of two techniques discussed above. In this case
blocks of the cache are grouped into sets and the mapping allows a block of
main memory to reside in any block of a particular set. Set associative
mapping is illustrated as shown in figure 10.8.
Consider an example, a cache with two blocks per set. The 6 bit set field of
the address determines which set of the cache might contain the addressed
block. The tag field of the address must be associatively compared to the
tags of the two blocks of the set to check if the desired block is present.
Advantages
a) The contention problem of the direct method is overcome by having few
choices for block replacement.
b) The hardware cost is reduced by decreasing the size of the associative
search.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 260
Note
The number of blocks per set can be selected according to the
requirements of a particular computer system. Four blocks per set can
be accommodated by a 5 bit set field, eight blocks per set by a 4-bit set
field and so on. The extreme conditions are one block per set (direct
mapping method) and 128 blocks per set (fully associative technique).
Each block must be provided with a valid bit where it indicates whether
the block contains valid data. When the main memory is loaded with
new programs and data from mass storage devices, valid bit of a
particular cache block is set to 1. It stays at 1 unless a main memory is
updated by a source that bypasses the cache. In this case, a check is
made to determine whether the block is currently in the cache. If it is, its
valid bit is set to 0.
Replacement algorithms
For any set associative mapping a replacement algorithm is needed. The
most common algorithms are discussed here. When a new block is to be
brought into the cache and all the positions that it may occupy are full, then
the cache controller must decide which of the old blocks to overwrite.
Because the programs usually stay in localized areas for a reasonable
period of time, there is high probability that the blocks that have been
referenced recently will be referenced again soon. Therefore when a block
is to be overwritten, the block that has not referenced for the longest time is
overwritten. This block is called least-recently-used block (LRU), and the
technique is called the LRU replacement algorithm. In order to use the LRU
algorithm, the cache controller must track the LRU block as computation
proceeds.
There are several replacement algorithms that require less overhead than
LRU method. One method is to remove the oldest block from a full set when
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 261
a new block must be brought in. This method is referred to as FIFO. In this
technique no updating is needed when hit occurs. However, because the
algorithm does not consider the recent patterns of access to blocks in the
cache, it is not effective as LRU approach in choosing the best block to
remove. There is another method called least frequently used (LFU) that
replaces that block in the set which has experienced the fewer references. It
is implemented by associating a counter with each slot. Yet another simplest
algorithm called random, is to choose a block to be overwritten in random.
10.3 External Memory
Magnetic Disk
A disk is a circular platter constructed of metal or of plastic coated with a
magnetic material. Data are recorded on and later retrieved from the disk
via a conducting coil named the head. During a read or write operation, the
head is stationary while the platter rotates beneath it. Writing is achieved by
producing a magnetic field which records a magnetic pattern on the
magnetic surface.
Data Organization and Formatting
Figure 10.9 depicts the data layout of disk. The head is capable of reading
or writing from a portion of the platter rotating beneath it. This gives rise to
organization of data on the platter in a concentric set of rings called Tracks.
Each track is the same width as the head. Adjacent tracks are separated by
gaps that minimize errors due to misalignment of head. Data is transferred
to and from the disk in blocks. And the block is smaller than the capacity of
a track. Data is stored in block regions which is an angular part of a track
and is referred to as a sector. Typically 10-100 sectors are there per track.
These may be either of fixed or variable length.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 262
Figure 10.9: Disk Data Layout
Physical Characteristics
Head motion : Fixed-head disk (one / track) or Movable-head disk
(one/surface).
Disk portability : No removable disk vs. Movable disk.
Sides : Double-sided vs. single-sided.
Platters : Single platter vs. Multiple platter disks.
Head mechanism : Contact (floppy), Fixed gap, Aerodynamic gap
(Winchester [= hard disk]).
Disk Performance Parameters
1. Seek time: Time required to move the disk arm (head) to the required
track. s n . m Ts
Where Ts = estimated seek time, n = number of tracks traversed,
m = constant that depends on the disk drive, s = startup time.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 263
2. Rotational delay: time required to rotate the disk to get wanted sector
beneath the head.
3. Transfer time: T = b / (r N)
Where T = transfer time, b = number of bytes to be transferred,
N = number of bytes on a track, r = rotation speed, in revolutions per
second.
4. Access time: Ta = total average access time.
Ta = Ts + (1 / 2 r) + (b / r N) where Ts = average seek time.
RAID
1. RAID is a set of physical disk drives viewed by the operating system as
a single logical drive.
2. Data are distributed across the physical drives of an array.
3. Redundant disk capacity is used to store parity information, which
guarantees data recoverability in case of a disk failure.
Optical Memory & Magnetic Tape are other two external memories.
10.4 Virtual memory
Virtual (or logical) memory is a concept that, when implemented by a
computer and its operating system, allows programmers to use a very large
range of memory or storage addresses for stored data. The computing
system maps the programmer's virtual addresses to real hardware storage
addresses. Usually, the programmer is freed from having to be concerned
about the availability of data storage.
In addition to managing the mapping of virtual storage addresses to real
storage addresses, a computer implementing virtual memory or storage also
manages storage swapping between active storage (RAM) and hard disk or
other high volume storage devices. Data is read in units called "pages" of
sizes ranging from a thousand bytes (actually 1,024 decimal bytes) up to
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 264
several megabytes in size. This reduces the amount of physical storage
access that is required and speeds up overall system performance.
The memory control circuitry translates the address specified by the
program into an address that can be used to access the physical memory.
This address is called logical address or virtual address. A set of virtual
addresses constitute the virtual address space.
The mechanism that translates virtual addresses into physical address is
usually implemented by a combination of hardware and software
components. If a virtual address refers to a part of the program or data
space that is currently in the physical memory, then the contents of the
appropriate physical location in the main memory are accessed. Otherwise
its contents must be brought into a suitable location in the main memory.
The mapping function is implemented by a special memory control unit
called as memory management unit. Mapping function can be changed
during the program execution according to the system requirement.
The simplest method of translation assumes that all programs and data are
composed of fixed length units called pages. Each page consists of a block
of words that occupy contiguous locations in the main memory or in the
secondary storage. Page normally ranges from 1K to 8K bytes in length.
They form the basic unit of information that is transmitted between main
memory and secondary storage devices.
Virtual address translation method
A virtual address translation method based on the concept of fixed length
pages is shown in Fig. 10.10. Each virtual address generated by the
processor is interpreted as a page number followed by a word number.
Information about the disk or the main memory is kept in a page table in the
main memory.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 265
Fig. 10.10: Virtual memory address translation
Virtual address Physical address to
from CPU main memory
Block number if Page present in
Main memory
Pointer to
Pointer to secondary storage if page not present in the main memory
PAGE TABLE
(implemented in the main memory)
• • • • • •
Page table base register
Page number Word number Block number Word number
Control Bits Block number of Secondary storage pointer
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 266
The starting address of this table is kept in a page table base register. By
adding the page number to the contents of this register, the address of
corresponding entry in the page table is obtained. The content of this
location gives the starting address of the page if that page currently resides
in the main memory. Otherwise they indicate the location where the page is
to be found in the secondary storage. In this case the entry in the table
usually points to an area in the main memory where the secondary storage
address of the page is held. Each entry also includes some control bits to
describe the status of the page while it is in the main memory. One control
bit indicates whether the page has been modified when it was in the main
memory.
If the page table is stored in the main memory unit, then the two main
memory accesses must be made for every main memory access requested
by the program. This may result in a degradation of speed by a factor of two.
However a specialized cache memory is used in most of the systems to
speed up the translation process by storing recently used virtual to physical
address translation.
Virtual memory increases the effective size of the main memory. Only the
active space of the virtual address space is mapped onto locations in the
physical main memory, whereas the remaining virtual addresses are
mapped onto the bulk storage devices used. During a memory cycle the
addressing spacing mechanism (hardware or software) determines whether
the addressed information is in the physical main memory unit. If it is, the
proper information is accessed and the execution proceeds. If it is not, a
contiguous block of words containing the desired information are transferred
from the bulk storage to main memory, displacing some block that is
currently inactive.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 267
10.5 Memory Management in Operating Systems
In virtual memory concept we assumed that only one large program is being
executed. If program and data does not fit into the physical main memory,
then a secondary storage should hold the overflow. The operating system
automatically swaps the programs between the main memory and
secondary storage.
Management routines are the parts of the operating system of the computer.
Virtual address space is divided into two parts system space and user space.
Operating system routines reside in system space and user application
programs reside in the user space.
In a multiuser environment each user will have a separate user space with a
separate page table. The memory management unit uses a page table base
register to determine the address of table to be used in the translation
process. Hence by changing the contents of this register operating system
can switch from one space to another. The physical main memory is thus
shared by the active pages of the system space and several user spaces.
However, only the pages that belong to one of these spaces are accessible
at any given time. In a multiuser environment, no program should be
allowed to modify either data or instructions of other programs in the main
memory. Hence protection should be given.
Such protection can be provided in several ways. Let us first consider the
most basic form of protection. Recall that in the simplest case the processor
has two states namely, the supervisor state and user state. The processor is
placed in the supervisor state while operating system routines are being
executed and in the user state to execute user programs. In the user state
some machine instructions cannot be executed. These privileged
instructions which include operations such as modifying the page table base
register cannot be executed in the user state. Hence a user program is
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 268
prevented from accessing the page tables of other user spaces or of the
system space.
It is sometimes desirable for one application program to have access to
certain pages belonging to another program. The operating system can
arrange this by causing these pages to appear in both spaces. The shared
pages will therefore have entries in two different page tables. The control
bits in each table entry can be set differently to control the Read/Write
access privileges granted to each program. The types of partitioning
methods are Fixed-size and unequal-size partitions and dynamic partitioning.
Paging is more efficient than partitioning. Programs are divided to "logical"
chunks known as "pages", assigned to available chunks of memory known
as "frames" or "page frames".
10.6 Summary
An intermediate memory called cache memory can be introduced between
main memory and CPU. We have introduced the reader to the principle
structure of cache memory along with the mapping functions and
replacement algorithms. Finally, we have touched upon a few external
memory elements, and another method to increase the effective size of the
memory is to introduce virtual memory
Self Assessment Questions
1. ____________ memory contains the copy of portions of main memory.
2. In ____________ design both CPU and main memory are directly
connected to the system bus.
3. Efficiency of the system that uses cache is ____________, when all the
references are confined to the cache.
4. Data are recorded on and later retrieved from the disk via a conducting
coil named ____________.
5. User application programs reside in the ____________.
Computer Organization and Architecture Unit 10
Sikkim Manipal University - DDE Page No. 269
10.7 Terminal Questions
1. Explain the replacement algorithms.
2. Discuss the physical characteristics of DISK.
3. Write a note about RAID.
10.8 Answers
Self Assessment Questions
1. cache
2. look-aside design
3. maximum
4. head
5. user space
Terminal Questions
1. Refer Section 10.2
2. Refer Section 10.3
3. Refer Section 10.3
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 270
Unit 11 Input / Output Basics
Structure:
11.1 Introduction
Objectives
11.2 External Devices
Classification of external devices
Input / Output problems
11.3 Input / Output Module
I/O Module Function
I/O Module Decisions
Input Output Techniques
11.4 Programmed I/O
I/O commands
I/O instructions
11.5 Interrupt Driven I/O
Basic concepts of an Interrupt
Response of CPU to an Interrupt
Design Issues
Priorities
Interrupt handling
Types of Interrupts
11.6 Summary
11.7 Terminal Questions
11.8 Answers
11.1 Introduction
In addition to the CPU and a set of memory modules, another key element
of a computer system is a set of I/O module. An I/O module is not just
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 271
mechanical connectors that wire a device into the system bus but it contains
some intelligence, that is, contains a logic for performing a communication
function between the peripheral and the bus. Owing to the following reason
we need an I/O module.
There are wide varieties of peripherals with a variety of operation
methods. It is impractical to incorporate the necessary logic within the
CPU to control a range of devices.
The data transfer rate of peripherals is often much slower than that of
the memory or CPU.
Peripherals often use different data formats and word lengths than the
computer system to which they are attached.
An I/O module has two major functions:
1) Interface to the CPU and memory via the system bus or central switch
2) Interface to one or more peripheral devices by tailored data links.
Objectives:
By the end of Unit 11, the learners should be able to:
1. Discuss the classification of external devices
2. Discuss I/O problems and functions of I/O modules
3. Explain the concept of programmed I/O.
4. Explain the various I/O instructions
5. Explain the concept of Interrupt driven I/O.
11.2 External Devices
I/O operations are accomplished through a wide assortment of external
devices that provide a means of exchanging data between the external
environment and the computer. An external device attaches to the computer
by a link to an I/O module as shown in figure 11.1. The link is used to
exchange control, status and data between the I/O module and the external
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 272
device. An external device connected to an I/O module is often referred to
as a peripheral device or simply a peripheral.
Figure 11.1: Generic Model of an I/O Module
Classification of External devices
External devices can be broadly classified into three categories:
1. Human readable: suitable for communicating with the computer user.
Examples: Screen, keyboard, video display terminals (VDT) and printers.
2. machine readable: suitable for communicating with equipments
Examples: magnetic disk & tape systems, Monitoring and control,
sensors and actuators which are used in robotics.
From functional point of view these devices are part of memory
hierarchy. But from structural point of view these devices are controlled
by I/O modules.
3. Communication: These devices allow a computer to exchange data with
remote devices, which may be machine readable or human readable.
Examples: Modem, Network Interface Card (NIC)
I/O Module
DataLines
Control Lines
Address Lines
System Bus
Links to Peripheral Devices
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 273
The nature of external devices and their interface is indicated in figure 11.2.
The interface of I/O module is in the form of control, status and data signals.
Data are in the form of a set of bits to be sent and received from the I/O
module. Control signals determine the function of the device, like INPUT or
READ to accept data from the I/O device, OUTPUT or WRITE to report
status, or perform some control particular to that device. Some signals
indicate the state of the device such as READY or NOTREADY to show
whether the device is ready for data transfer.
Control logic associated with the device controls the device operation in
response to the direction from the I/O module. The transducer converts the
data forms from electrical signals into other forms of energy or vice versa.
Typically a buffer is associated with the transducer to temporarily hold the
data being transferred between the I/O module and external devices.
Input / Output Problems
Wide variety of peripherals
Delivering different amounts of data
At different speeds
In different formats
All slower than CPU and RAM
Hence Need I/O modules
Control from I/O Module
Status to I/O Module
Data (bits) to and from I/O Module
Control Logic
Buffer
Transducer
Data (Device-unique) to and from environment
Figure 11.2: An External Device
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 274
11.3 Input / Output Module
It is the entity within a computer that is responsible for the control of one or
more external devices and for the exchange of data between those devices
and main memory and/or CPU.
Data Register
Status/Control Register
External Device Interface Logic
External Device Interface Logic
Input / Output Logic
Data Lines
Address Lines
Data Lines
Data
Status
Control
Data
Status
Control
Systems Bus Interface External Device Interface
Figure 11.3: I/O Module Diagram
The block diagram of an I/O Module is as shown in figure 11.3. Thus I/O
memory must have
Interface to CPU and Memory
Interface to one or more peripherals
I/O Module Function
The major functions or requirements for an I/O module fall into the following
five categories.
Control & Timing
CPU Communication
Device Communication
Data Buffering
Error Detection
During any period of time, the CPU may communicate with one or more
external devices in unpredictable patterns on the program’s need for I/O,
the internal resources, main memory and the CPU must be shared among
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 275
number of activities, including handling data I/O. Thus the I/O device
includes a control and timing requirement to coordinate the flow of traffic
between internal resources and external devices to the CPU. Thus CPU
might involve in sequence of operations like:
CPU checks I/O module device status
I/O module returns device status
If ready, CPU requests data transfer
I/O module gets data from device
I/O module transfers data to CPU
Variations for output, DMA, etc.
I/O module must have the capability to engage in communication with the
CPU and external device. Thus CPU communication involves
Command decoding: The I/O module accepts commands from the CPU
carried on the control bus.
Data: data are exchanged between the CPU and the I/O module over
data bus
Status reporting: Because peripherals are slow it is important to know
the status of I/O module. I/O module can report with the status signals.
Commonly used status signals are BUSY or READY. Various other
status signals may be used to report various error conditions.
Address recognition: just as each memory word has an address, there is
address associated with every I/O device. Thus I/O module must be
recognized with a unique address for each peripheral it controls.
The I/O module must also be able to perform device communication. This
communication involves commands, status information, and data. Some of
the essentials tasks are listed below:
Data buffering: the transfer rate into and out of main memory or CPU is
quite high, and the rate is much lower for most of the peripherals. The
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 276
table 11.1 gives the data rate of various devices. Data coming from main
memory are sent to an I/O module in a rapid burst. The data is buffered
in the I/O module and then sent to the peripheral device at its rate. In the
opposite direction data are buffered so as not to tie up the memory in a
slow transfer operation. Thus I/O module must be able to operate at
both device and memory speeds.
Table 11.1: data rates of various devices
Error detection: I/O module is often responsible for error detection and
subsequently reporting errors to the CPU. We categorize here two
classes of errors.
1. Mechanical and electrical malfunctions: These types of errors are
reported by the device such as paper jam, bad disk track, etc.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 277
2. Unintentional change of bits pattern: As the data is transmitted from
device to I/O module unintentionally, bit patterns might change.
Some kind of error detecting code is often used to detect the
transmission errors.
Common example is to use a parity bit on each character of data. For
example ASCII character code occupies 7-bits of a byte. The eighth bit is
set so that the total number of 1’s in the byte is even for even parity or total
number of 1’s in the byte is odd for odd parity. Thus when a byte is received
an I/O module checks the parity to find whether any error has occurred.
I/O Module Decisions
Hide or reveal device properties like details of timing, formats to CPU.
Support multiple or single device.
Control device functions or leave it for CPU.
Also OS decisions e.g. Unix treats everything it can as a file.
Consider the problem of moving a character code from the keyboard to the
processor.
Striking a key, stores the corresponding character code in an 8-bit buffer
register associated with the keyboard. Let us call this register DATAIN, as
shown in Figure 11.4.
Figure 11.4: example of an I/O module
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 278
To inform the processor that a valid character is in DATAIN, a status control
flag, SIN, is set to 1. A program monitors SIN and when SIN is set to 1, the
processor reads the contents of DATAIN. When the character is transferred
to the processor, SIN is automatically cleared to 0. If a second character is
entered at the keyboard, SIN is again set to 1 and the process repeats.
An analogous process takes place when characters are transferred from the
processor to the display. A buffer register, DATAOUT and a status control
flag, SOUT are used for this transfer. When SOUT equals 1, the display is
ready to receive a character. Under program control, the processor monitors
SOUT, and when SOUT is set to 1, the processor transfers a character code
to DATAOUT. The transfer of a character to DATAOUT clears SOUT to 0;
when the display device is ready to receive a second character, SOUT is
again set to 1. The buffer registers DATAIN and DATAOUT and the status
flags SIN and SOUT are part of circuitry commonly known as a device
interface.
An I/O module that takes most of detailed processing burden, presenting to
a high level CPU, is usually referred to as I/O channel or I/O processor. An
I/O module that is primitive and requires detailed control is usually referred
to as an I/O controller or device controller. I/O controller is seen on
microcomputers, I/O channels on mainframes, with minicomputers
employing the mixture.
Input Output Techniques
Three techniques are possible for I/O operations. They are:
Programmed I/O
Interrupt driven
Direct Memory Access (DMA)
Table 11.2 indicates the relationship among the three techniques.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 279
With Programmed I/O, data are exchanged between the CPU and the I/O
module. The CPU executes a program that gives it direct control of the I/O
operation, including sensing device status, sending a read or write
command and transferring data. When CPU issues a command to I/O
module, it must wait until I/O operation is complete. If the CPU is faster than
I/O module, there is wastage of CPU time.
With Interrupt driven I/O, the CPU issues a command to I/O module and it
does not wait until I/O operation is complete but instead continues to
execute other instructions. When I/O module has completed its work it
interrupts the CPU.
With both Programmed I/O and Interrupt driven I/O the CPU is
responsible for extracting data from main memory for output and storing
data in main memory for input.
Table 11.2: I/O Techniques
No Interrupts Use of Interrupts
I/O-to-memory transfer through processor
Programmed I/O Interrupt driven I/O
Direct I/O-to-memory transfer
Direct Memory Access (DMA)
11.4 Programmed I/O
When the CPU is executing a program and encounters an instruction
relating to I/O, it executes those instructions by issuing a command to the
appropriate I/O module. With this technique or using programmed I/O, the
I/O module will perform the requested action and then set appropriate bits in
the status register. The I/O module does not take any further action to alert
CPU that is it does not interrupt CPU. Hence it is the responsibility of the
CPU to periodically check the status of the I/O module until it finds that the
operation is complete.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 280
The sequences of actions that take place with programmed I/O are:
CPU requests I/O operation
I/O module performs operation
I/O module sets status bits
CPU checks status bits periodically
I/O module does not inform CPU directly
I/O module does not interrupt CPU
CPU may wait or come back later
I/O commands
To execute an I/O related instruction, the CPU issues an address, specifying
the particular I/O module and external device and an I/O command. Four
types of I/O commands can be received by the I/O module when it is
addressed by the CPU. They are
A control command: is used to activate a peripheral and tell what to do.
Example: a magnetic tape may be directed to rewind or move forward a
record.
A test command: is used to test various status conditions associated
with an I/O module and its peripherals. The CPU wants to know the
interested peripheral for use. It also wants to know the most recent I/O
operation is completed and if any errors have occurred.
A read command: it causes the I/O module to obtain an item of data
from the peripheral and place it in an internal buffer. The CPU then gets
the data items by requesting I/O module to place it on the data bus.
A write command: it causes the I/O module to take an item of data from
the data bus and subsequently transmit the data item to the peripheral.
A flow chart for input of a block of data using programmed I/O is as shown in
figure 11.5. It reads in a block of data from a peripheral device into memory.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 281
Fig. 11.5: input of block of data using programmed I/O
Data is read in one word at a time. For each word read in, the CPU keeps
checking the status until the word is available in I/O module’s data register.
That is the possible sequence of I/O commands and actions are:
CPU issues address
Identifies module (& device if >1 per module)
CPU issues command
Control - telling module what to do
e.g. spin up disk
Test - check status
e.g. power? Error?
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 282
Read/Write
Module transfers data via buffer from/to device
From the flow chart it is clear that the main disadvantage of this technique is
that it is a time consuming process that keeps the processor busy
unnecessarily.
I/O instructions
Addressing I/O Devices
Under programmed I/O data transfer is very like memory access (CPU
viewpoint)
Each device given unique identifier
CPU commands contain identifier (address)
I/O Mapping
When the CPU, main memory, and I/O module share a common bus, two
modes of addressing are possible.
1. Memory mapped I/O
Devices and memory share an address space
I/O looks just like memory read/write
No special commands for I/O
Large selection of memory access commands available
2. Isolated I/O
Separate address spaces
Need I/O or memory select lines
Special commands for I/O
Limited set
Consider an example: The processor can monitor the keyboard status flag
SIN and transfer a character from DATAIN to register R1 using the following
sequence of operations:
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 283
READWAIT Branch to READWAIT if SIN = 0
Input from DATAIN to R1
The Branch operation is usually implemented by two machine instructions.
The first instruction tests the status flag and the second performs the branch.
Although the details vary from computer to computer, the main idea is that
the processor monitors the status flag by executing a short wait loop and
proceeds to transfer the input data when SIN is set to 1 as a result of a key
being struck. The Input operation resets SIN to 0.
Another example is transferring output to the display. The sequence of
instructions is:
WRITEWAIT Branch to WRITEWAIT if SOUT = 0.
Output from R1 to DATAOUT
Again, the Branch operation is normally implemented by two machine
instructions. The wait loop is executed repeatedly until the status flag SOUT
is set to 1 by the display when it is free to receive a character. The Output
operation transfers a character from R1 to DATAOUT to be displayed, and it
clears SOUT to 0.
We assume that the initial state of SIN is 0 and the initial state of SOUT is 1.
This initialization is normally performed by the device control circuits when
the devices are placed under computer control before program execution
begins. Until now, we have assumed that the addresses issued by the
processor to access instructions and operands always refer to memory
locations. Many computers use an arrangement called memory-mapped I/O
in which some memory address values are used to refer to peripheral
device buffer registers, such as DATAIN and DATAOUT.
Thus, no special instructions are needed to access the contents of these
registers; data can be transferred between these registers and the
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 284
processor using instructions that we have already discussed, such as Move,
Load or Store. For example, the contents of the keyboard character buffer
DATAIN can be transferred to register R1 in the processor by the instruction
MoveByte DATAIN, R1
Similarly, the contents of register R1 can be transferred to DATAOUT by the
instruction MoveByte R1,DATAOUT The status flags SIN and SOUT are
automatically cleared when the buffer registers DATAIN and DATAOUT are
referenced, respectively. The MoveByte operation code signifies that the
operand size is a byte, to distinguish it from the operation code Move that
has been used for word operands. We have established that the two data
buffers may be addressed as if they were two memory locations. It is
possible to deal with the status flags SIN and SOUT in the same way, by
assigning them distinct addresses. However, it is more common to include
SIN and SOUT in device status registers, one for each of the two devices.
Let us assume that bit b3 in registers INSTATUS and OUTSTATUS
corresponds to SIN and SOUT, respectively. The read operation just
described may now be implemented by the machine instruction sequence
READWAIT Testbit #3, INSTATUS Branch=0 READWAIT MoveByte
DATAIN, R1.
The write operation may be implemented as WRITEWAIT Testbit #3,
OUTSTATUS.
Branch=0 WRITEWAIT
MoveByte R1, DATAOUT
The Testbit instruction tests the state of one bit in the destination location,
where the bit position to be tested is indicated by the first operand. If the bit
tested is equal to 0, then the condition of the branch instruction is true, and
a branch is made to the beginning of the wait loop. When the device is
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 285
ready, that is, when the bit tested becomes equal to 1, the data are read
from the input buffer or written into the output buffer.
The program shown in Figure 11.6 uses these two operations to read a line
of characters typed at a keyboard and send them out to a display device. As
the characters are read in, one by one, they are stored in a data area in the
memory and then echoed back out to the display. The program finishes
when the carriage return character, CR, is read, stored and sent to the
display. The address of the first byte location of the memory data area
where the line is to be stored is LOC. Register R0 is used to point to this
area, and it is initially loaded with the address LOC by the first instruction in
the program. R0 is incremented for each character read and displayed by
the Autoincrement addressing mode used in the Compare instruction.
Figure 11.6: Program for read from a keyboard and write output to a display
device
11.5 Interrupt Driven I/O
Using Program-controlled I/O requires continuous involvement of the
processor in the I/O activities. It is desirable to avoid wasting processor
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 286
execution time. An alternative is for the CPU to issue an I/O command to a
module and then go on other work. The I/O module will then interrupt the
CPU requesting service when it is ready to exchange data with the CPU.
The CPU will then execute the data transfer and then resumes its former
processing. Based on the use of interrupts, this techniques improves the
utilization of the processor.
An interrupt is more than a simple mechanism for coordinating I/O transfers.
In a general sense, interrupts enable transfer of control from one program to
another to be initiated by an event that is external to a computer. Execution
of the interrupted program resumes after completion of execution of the
interrupt service routine. The concept of interrupts is useful in operating
systems and in many control applications where processing of certain
routines has to be accurately timed relative to the external events. The latter
type of application is generally referred to as real-time processing.
The operations listed can be better understood with an example to input a
block of data. A flow chart using this technique for input of a block of data is
as shown in figure 11.7.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 287
Figure 11.7: input block of data using interrupt driven I/O
Using Interrupt Driven I/O technique CPU issues read command. I/O
module gets data from peripheral while CPU does other work and I/O
module interrupts CPU, checks the status if no error, that is, the device is
ready, then CPU requests data and I/O module transfers data. Thus CPU
reads the data and stores it in the main memory.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 288
Thus Interrupt Driven I/O
Overcomes CPU waiting.
No repeated CPU checking of device.
I/O module interrupts when ready.
Basic concepts of an Interrupt
An interrupt is an exception condition in a computer system caused by an
event external to the CPU. Interrupts are commonly used in I/O operations
by a device interface (or controller) to notify the CPU that it has completed
an I/O operation.
An interrupt is indicated by a signal sent by the device interface to the CPU
via an interrupt request line (on an external bus). This signal notifies the
CPU that the signaling interface needs to be serviced. The signal is held
until the CPU acknowledges or otherwise services the interface from which
the interrupt originated.
Note: interrupts may be generated by interfaces or devices which are not
involved in I/O. For example, if a computer system contains a clock (typically
called a "timer") outside of the CPU, that clock may signal its "ticks" by
interrupts. However, exception conditions or signals internal to the CPU,
such as occur when there is an attempt to execute a non-existent instruction,
are typically called traps, not interrupts.
Response of CPU to an Interrupt
The CPU checks periodically to determine if an interrupt signal is pending.
This check is usually done at the end of each instruction, although some
modern machines allow for interrupts to be checked for several times during
the execution of very long instructions.
When the CPU detects an interrupt, it then saves its current state (at least
the PC and the Processor Status Register containing condition codes); this
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 289
state information is usually saved in memory. After the interrupt has been
serviced, this state information is restored in the CPU and the previously
executing software resumes execution as if nothing had happened.
How is an Interrupt Serviced?
The CPU runs a program called an interrupt handler. Interrupt handler
software may be general and be able to service all device interfaces, but
typically an interrupt handler is designed to service an interface for exactly
one type of device (e.g., a terminal or a disk).
The interrupt handler must make it possible for the software which was
executing in the CPU and which was “interrupted" to resume its execution
after the interrupt is serviced as if nothing had happened. Since the CPU
hardware saves only a minimal amount of information when it saves the
state of the CPU, the interrupt handler is responsible for saving and
restoring data such as the contents of the general purpose registers which it
uses.
Design Issues
How do you identify the module issuing the interrupt?
How do you deal with multiple interrupts?
i.e. an interrupt handler being interrupted
Identifying Interrupting Module
When the CPU determines that an interrupt has occurred, it must be
determined what device interface signaled the interrupt. Two primary
approaches are used to do this. In some systems, a general interrupt
handler queries each device interface. When one acknowledges positively
that it originated the interrupt, then that device is serviced.
A more efficient alternative approach is to use vectored interrupts. When
vectored interrupts are used, the interface signaling the interrupt will
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 290
“identify" itself by sending information (its "vector") to the CPU. That
information will permit the CPU to select the correct interrupt handler. In
actuality, that information may be a pointer to where the starting address of
the interrupt handler is stored.
Multiple Interrupts
Each interrupt line has a priority
Higher priority lines can interrupt lower priority lines
If bus mastering only current master can interrupt
Priorities
Because several interrupts may be found pending when the CPU checks for
an interrupt, priorities may be used to determine in which order the
interrupts are serviced. When several interrupts are pending, the one of
highest priority is always serviced next. Priorities are assigned to device
interfaces on the basis of any of several factors.
Generally faster devices have higher priorities assigned to their interfaces;
disks transmit data at a faster rate than terminals so usually have higher
priorities. Also, if a device interface is not serviced in time, incoming data
may be written over by later data; thus devices such as sensors which do
not have a means to recover lost data (without manual intervention) may be
given fairly high priorities.
Priorities may be used to control the nesting of interrupts. In this case, the
CPU is itself assigned a priority depending on what is executing; typical user
programs usually run at the lowest priority while an interrupt handler may
run at the same priority which is assigned the corresponding interface which
it is servicing. Interrupts which carry a priority which is lower than (or equal
to) the current CPU priority are masked out - they are simply not noticeable
to the CPU. This allows "more important" device interfaces to interrupt the
servicing of "less important" device interfaces, while "less important"
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 291
interrupts must await the completion of servicing "more important" ones.
Furthermore, device interfaces can be prevented from interrupting their own
interrupt handlers. In addition, all interrupts can be blocked out by raising
the CPU priority level to its highest possible value; this would allow for a
critical section of instructions to be executed without “interruption." However,
this type of mechanism should only be used for very short segments of code
and for very good reasons (e.g., during context switches).
Priorities may be implemented by providing several interrupt request lines in
the external bus; each device is then attached to the line corresponding to
the interrupt priority level to which it has been assigned. Furthermore, if
several devices are attached to the same interrupt request line via daisy
chaining, the one closest to be CPU will be serviced first.
Interrupt Handling
We have already introduced hardware interrupts, by which a hardware
device can interrupt the normal operation of the CPU to signal that some
significant or time-critical event has occurred (e.g., a character arriving over
a serial line or an alarm event). The complete cycle of hardware and
software operations involved in a typical interrupt, from signaling to
resolution, are listed below:
Hardware Actions
The device controller asserts an interrupt line on the system bus to start
the interrupt sequence.
As soon as the CPU is prepared to handle interrupt, it asserts an
interrupt acknowledgement signal on the bus.
When the device controller sees that its interrupt signal has been
acknowledged it puts a small integer on the data lines to identify itself;
this is the "interrupt vector".
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 292
The CPU removes the interrupt vector from the bus and saves it
temporarily.
The CPU pushes the program counter and the Program Status Word
(PSW, or FLAGS register) onto the stack.
The CPU then locates a new program counter value using the interrupt
vector as an index into a (operating system) table at a well-defined point
in memory. This new program counter starts the interrupt service routine
(normally part of the device driver) for the device causing the interrupt.
Software Actions
The interrupt service routine saves all registers so that they can be
restored later. This may be on the stack or in a system table.
The exact device causing the interrupt is identified (e.g. by polling) -
each interrupt vector may in general be shared by a number of devices.
Any other information about the interrupt, e.g. status codes, can now be
read in.
If an I/O error has occurred it can be handed here.
The interrupt is responded to in a device specific manner. **
If necessary, a special code may be output to the device or interrupt
controller to indicate that the interrupt has been handled.
The saved register values are restored.
Execute a "Return from Interrupt" instruction, which restores the PC and
PSW from when the interrupt occurred.
The program which was interrupt continues executing from exactly
where it left off.
The step (**) above is where the "real" work will be done, e.g. reading in a
character from a serial line into a local buffer or signaling to the rest of the
program that some error or significant change in status has occurred.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 293
Figure 11.8: Connections between Devices and Interrupt Controller actually
use the Bus Lines instead of dedicated Wires
Types of Interrupts
An interrupt is a event that causes the execution of one program to be
suspended and another program to be executed. So far we have assumed
that the occurrence of this event is caused by a request received by an I/O
device in the form of a hardware signal over the computer bus. In fact, there
are many uses for interrupts other than for controlling I/O transfers, some of
which we will now describe.
Recovery from errors
Computers use a variety of techniques to ensure that all hardware
components are operating properly. For example, many computers include
a parity check code in the main memory, which allows detection of errors in
the stored data. If an error occurs, the control hardware detects it and
informs the CPU by raising an interrupt. The CPU may also interrupt a
program if it detects an error or an unusual condition while executing the
instructions of this program. For example, the OP-code field of an instruction
may not correspond to any legal instruction, or some instruction may
attempt a division by zero.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 294
When an interrupt is initiated as a result of such causes, the CPU proceeds
exactly in the same manner as in the case of an I/O interrupt request. It
suspends the program being executed and starts an interrupt service routine.
The interrupt service routine should take appropriate action to recover from
the error, if possible or inform the user about it.
Debugging
Another important use of interrupts is as an aid in debugging programs.
System software usually includes a program called a debugger, which helps
the programmer to find errors in the program. The debugger uses interrupts
to provide two important facilities: trace and break points. A trace facility
causes an interrupt to occur after execution of every instruction in a program
that is being debugged. The interrupt service routine for a trace interrupt
starts the execution of the debugging routine, which allows the user to
examine the contents of registers, memory locations and so on. On return
from the debugging routine, the next instruction is executed, then the
debugging routine is activated again. During the execution of the debugging
routine trace interrupts are disabled.
Break points provide a similar facility, except that the program being
debugged is interrupted at only specific points selected by the user. An
instruction called trap or software interrupt is usually provided for this
purpose. Execution of this instruction results in exactly the same actions as
when a hardware interrupt request is received. Thus if the user wishes to
interrupt a program after execution of the instruction i the debugging routine
replaces instruction i+1 with a software interrupt instruction. When the
program is executed and reaches that point, it is interrupted and the
debugging routine is activated. Later, when the user is ready to continue
executing the program being debugged, the debugging routine restores the
instruction that was at location i+1 and executes a Return-from-interrupt
instruction.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 295
Communication between Programs
Software interrupt instructions are used by the operating system to
communicate with and control the execution of other programs.
11.6 Summary
An I/O module is a key element of a computer system. An I/O module has
two major functions: first one to interface with the CPU and memory via the
system bus or central switch and second to interface with one or more
peripheral devices by tailored data links. Basic I/O operations and
Programmed I/O, Interrupt-driven I/O, Direct memory access (DMA)
techniques are also discussed.
Programmed I/O does not interrupt the processor. The processor must
periodically check the status of the I/O module until the task is completed.
Showing how characters are transferred between the processor and
keyboard and display devices. Interrupt-driven I/O allows the processor to
do other tasks while waiting for the I/O device to send an interrupt. Interrupt-
driven I/O is more efficient than Programmed I/O.
Self Assessment Questions
1. External devices can be broadly classified into ___________ categories.
2. ___________ is a device that allows a computer to exchange data with
remote devices.
3. The data rate of a mouse is ___________.
4. In ___________ I/O, the I/O module doesn’t interrupt the CPU.
5. In ___________, devices and memory share an address space.
Computer Organization and Architecture Unit 11
Sikkim Manipal University - DDE Page No. 296
11.7 Terminal Questions
1. List the classification of external devices.
2. Explain I/O module in detail.
3. Explain the concept of programmed I/O.
4. Discuss the working principle of an interrupt.
5. Explain how CPU responses to an interrupt.
11.8 Answers
Self Assessment Questions
1. two
2. modem or NIC
3. 100 bytes / sec
4. programmed
5. memory mapped I/O
Terminal Questions
1. Refer Section 11.2
2. Refer Section 11.3
3. Refer Section 11.4
4. Refer Section 11.5
5. Refer Section 11.5
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 297
Unit 12 Direct Memory Access
Structure:
12.1 Introduction
Objectives
12.2 Direct Memory Access
DMA Function and Operation
DMA Configurations
12.3 DMA Controller
DMA Transfer Types
DMA Transfer modes
DMA Controller Operation
Advantages
12.4 Synchronization Requirements for DMA and Interrupts
Synchronization with Interrupts
Synchronization with DMA
12.5 Summary
12.6 Terminal Questions
12.7 Answers
12.1 Introduction
Direct Memory Access capabilities are provided by some computer bus
architectures that allow data to be sent directly from an attached device
(such as a disk drive) to the memory on the computer's motherboard. The
microprocessor is freed from involvement with the data transfer, thus
speeding up overall computer operation.
Objectives:
By the end of Unit 12, the learners should be able to explain the DMA
function, operation and configurations
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 298
12.2 Direct Memory Access
Interrupt driven and programmed I/O require active CPU intervention.
Transfer rate is limited.
CPU is tied up.
DMA is the answer.
Direct Memory Access capabilities are provided by some computer bus
architectures that allow data to be sent directly from an attached device
(such as a disk drive) to the memory on the computer's motherboard. The
microprocessor is freed from involvement with the data transfer, thus
speeding up overall computer operation.
Usually a specified portion of memory is designated as an area to be used
for direct memory access. In the ISA bus standard, up to 16 megabytes of
memory can be addressed for DMA. The EISA and Micro Channel
Architecture Standards allow access to the full range of memory addresses
(assuming they're addressable with 32-bits). Peripheral Component
Interconnect accomplishes DMA by using a bus master (with the
microprocessor "delegating" I/O control to the PCI controller).
DMA Function and Operation
The DMA module is as shown in figure 12.1, which is capable of mimicking
the CPU.
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 299
Figure 12.1: Typical DMA block diagram
Consider an example of input of a block of data using DMA access. The flow
chart is given in figure 12.2.
Figure 12.2: input block of data using DMA access
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 300
When the CPU wishes to read or write a block of data, it issues a command
to the DMA module and gives the following information:
CPU tells DMA controller:-
Read/Write
Device address
Starting address of memory block for data
Amount of data to be transferred
The CPU carries on with other work.
Thus DMA controller steals the CPU’s work of I/O operation.
The DMA module transfers the entire block of data,
One word at a time, directly to or from memory, without going through
CPU.
When the transfer is complete
DMA controller sends interrupt when finished
Thus CPU is involved only at the beginning and at the end of the transfer.
DMA Configurations
The DMA mechanism can be configured in a variety of ways. Some of the
common configurations are discussed here.
1. Single Bus Detached DMA
In this configuration all modules share the same system bus. The block
diagram of a single bus detached DMA is as shown in figure 12.3. The DMA
module that is mimicking the CPU uses the programmed I/O to exchange
the data between the memory and the I/O module through the DMA module.
This scheme may be inexpensive but is clearly inefficient. The number of
bus cycles is quite substantial.
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 301
Features of this configuration are:
Single Bus, Detached DMA controller
Each transfer uses bus twice
I/O to DMA then DMA to memory
CPU is suspended twice
Figure 12.3: Block diagram of single bus detached DMA
2. Single Bus, integrated DMA
Here there is a path between DMA module and one or more I/O modules
that do not include the system bus. The block diagram of single bus
Integrated DMA is as shown in figure 12.4. The DMA logic can actually be
considered as a part of an I/O module or there may be a separate module
that controls one more I/O modules.
Figure 12.4: Block diagram of single bus integrated DMA
The features of this configuration can be considered as:
Single Bus, Integrated DMA controller
Controller may support >1 device
Each transfer uses bus once
DMA to memory
CPU is suspended once
CPUDMA
Controller
I/O
Device
I/O
Device
Main
Memory
CPUDMA
Controller
I/O
Device
I/O
Device
Main
Memory
DMA
Controller
I/O
Device
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 302
3. DMA using an I/O bus
One further step of the concept of integrated DMA is to connect I/O modules
to DMA controller using a separate bus called I/O bus. This reduces the
number of I/O interfaces in the DMA module to one and provides for an
easily expandable configuration. The block diagram of DMA using I/O bus is
as shown in figure 12.5. Here the system bus that the DMA shares with
CPU and main memory is used by DMA module only to exchange data with
memory. And the exchange of data between the DMA module and the I/O
modules takes place, off the system bus that is through the I/O bus.
Figure 12.5: Block diagram of DMA with I/O bus and system bus
The features of this configuration are:
Separate I/O Bus
Bus supports all DMA enabled devices
Each transfer uses bus once
DMA to memory
CPU is suspended once
12.3 DMA Controller
DMA has been a built-in feature of PC architecture since the introduction of
the original IBM PC. PC-based DMA was used for floppy disk I/O in the
original PC and for hard disk I/O in later versions. PC-based DMA
CPU DMA
Controller
I/O
Device
I/O
Device
Main
Memory
I/O
Device
I/O
Device
I/O bus
system bus
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 303
technology, along with high-speed bus technology, is driven by data storage,
communications and graphics needs – all of which require the highest rates
of data transfer between system memory and I/O devices. Data acquisition
applications have the same needs and therefore can take advantage of the
technology developed for larger markets. This section introduces DMA
controller terminology and explains the basic operation of a PC - based
DMA controller along with common modes of operation. Key terminology is
italicized.
A DMA controller is a device, usually peripheral to a computer's CPU, that is,
programmed to perform a sequence of data transfers on behalf of the CPU.
A DMA controller can directly access memory and is used to transfer data
from one memory location to another, or from an I/O device to memory and
vice versa. A DMA controller manages several DMA channels, each of
which can be programmed to perform a sequence of these DMA transfers.
Devices, usually I/O peripherals, that acquire data that must be read (or
devices that must output data and be written to) signal the DMA controller to
perform a DMA transfer by asserting a hardware DMA request signal. A
DMA request signal for each channel is routed to the DMA controller. This
signal is monitored and responded to in much the same way as a processor
handles interrupts. When the DMA controller sees a DMA request, the DMA
controller responds by performing one or many data transfers from that I/O
device into system memory or vice versa. Channels must be enabled by the
processor for the DMA controller to respond to DMA requests. The number
of transfers performed, transfer modes used and memory locations
accessed depend on how the DMA channel is programmed.
A DMA controller typically shares the system memory and I/O bus with the
CPU and has both bus master and slave capability. Figure 12.6 shows the
DMA controller architecture and how the DMA controller interacts with the
CPU. In bus master mode, the DMA controller acquires the system bus
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 304
(address, data and control lines) from the CPU to perform the DMA transfers.
Because the CPU releases the system bus for the duration of the transfer,
the process is sometimes referred to as cycle stealing. However, this term is
no longer appropriate in the context of the new high-performance
architectures that are appearing in personal computers. These architectures
use cache memory for the CPU, which enables the DMA controller to
operate in parallel with the CPU to some extent.
Figure 12.6: DMA controller
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 305
In bus slave mode, the DMA controller is accessed by the CPU, which
programs the DMA controller's internal registers to set up DMA transfers.
The internal registers consist of source and destination address registers
and transfer count registers for each DMA channel, as well as control and
status registers for initiating, monitoring and sustaining the operation of the
DMA controller.
DMA Transfer Types
DMA controllers vary as to the type of DMA transfers and the number of
DMA channels they support. The two types of DMA transfers are flyby DMA
transfers and fetch-and-deposit DMA transfers. The three common transfer
modes are single, block, and demand transfer modes. These DMA transfer
types and modes are described in the following sections.
1. Flyby DMA transfer:
It is the fastest DMA transfer type and is also referred to as a single-cycle,
single-address. In a flyby DMA transfer, a single bus operation is used to
accomplish the transfer, with data read from the source and written the
destination simultaneously. In flyby operation, the device requesting service
asserts a DMA request on the appropriate channel request line of the DMA
controller. The DMA controller responds by gaining control of the system
bus from the CPU and then issuing the pre-programmed memory address.
Simultaneously, the DMA controller sends a DMA acknowledge signal to the
requesting device. This signal alerts the requesting device to drive the data
onto the system data bus or to latch the data from the system bus,
depending on the direction of the transfer.
In other words, a flyby DMA transfer looks like a memory read or write cycle
with the DMA controller supplying the address and the I/O device reading or
writing the data. Because flyby DMA transfers involve a single memory cycle
per data transfer, these transfers are very efficient; however, memory-to-
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 306
memory transfers are not possible in this mode. Figure 12.7 shows the flyby
DMA transfer signal protocol.
Figure 12.7: Flyby DMA
2. Fetch-and-deposit DMA transfer:
The second type of DMA transfer is referred to as a dual-cycle, dual-
address, flow-through, or fetch-and-deposit DMA transfer. As these names
imply, this type of transfer involves two memory or I/O cycles. The data
being transferred is first read from the I/O device or memory into a
temporary data register internal to the DMA controller. The data is then
written to the memory or I/O device in the next cycle. Figure 12.8 shows the
fetch-and-deposit DMA transfer signal protocol. Although inefficient because
the DMA controller performs two cycles and thus retains the system bus
longer, this type of transfer is useful for interfacing devices with different
data bus sizes.
For example, a DMA controller can perform two 16-bit read operations from
one location followed by a 32-bit write operation to another location. A DMA
controller supporting this type of transfer has two address registers per
channel (source address and destination address) and bus-size registers, in
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 307
addition to the usual transfer count and control registers. Unlike the flyby
operation, this type of DMA transfer is suitable for both memory-to-memory
and I/O transfers.
Figure 12.8: fetch and deposit transfer
DMA Transfer Modes
In addition to DMA transfer types there are different transfer modes. Most
common transfer modes are single, block and demand modes.
Single transfer mode
Single transfer mode transfers one data value for each DMA request
assertion. This mode is the slowest method of transfer because it requires
the DMA controller to arbitrate for the system bus with each transfer. This
arbitration is not a major problem on a lightly loaded bus, but it can lead to
latency problems when multiple devices are using the bus.
Block transfer mode
For block mode transfers, the DMA controller performs the entire DMA
sequence as specified by the transfer count register at the fastest possible
rate in response to a single DMA request from the I/O device.
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 308
Demand transfer mode
For demand mode transfers, the DMA controller performs DMA transfers at
the fastest possible rate as long as the I/O device asserts its DMA request.
When the I/O device unasserts this DMA request, transfers are held off.
Block and demand transfer modes increase system throughput by allowing
the DMA controller to perform multiple DMA transfers when the DMA
controller has gained the bus.
DMA Controller Operation
For each channel, the DMA controller saves the programmed address and
count in the base registers and maintains copies of the information in the
current address and current count registers, as shown in Figure 12.6 each
DMA channel is enabled and disabled via a DMA mask register. When DMA
is started by writing to the base registers and enabling the DMA channel, the
current registers are loaded from the base registers. With each DMA
transfer, the value in the current address register is driven onto the address
bus, and the current address register is automatically incremented or
decremented.
The current count register determines the number of transfers remaining
and is automatically decremented after each transfer. When the value in the
current count register goes from 0 to -1, a terminal count (TC) signal is
generated, which signifies the completion of the DMA transfer sequence.
This termination event is referred to as reaching terminal count. DMA
controllers often generate a hardware TC pulse during the last cycle of a
DMA transfer sequence. This signal can be monitored by the I/O devices
participating in the DMA transfers.
DMA controllers require reprogramming when a DMA channel reaches TC.
Thus, DMA controllers require some CPU time, but far less than is required
for the CPU to service device I/O interrupts. When a DMA channel reaches
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 309
TC, the processor may need to reprogram the controller for additional DMA
transfers. Some DMA controllers interrupt the processor whenever a
channel terminates. DMA controllers also have mechanisms for
automatically reprogramming a DMA channel when the DMA transfer
sequence completes. These mechanisms include auto initialization and
buffer chaining.
The auto initialization feature repeats the DMA transfer sequence by
reloading the DMA channel's current registers from the base registers at the
end of a DMA sequence and re-enabling the channel. Buffer chaining is
useful for transferring blocks of data into non-contiguous buffer areas or for
handling double-buffered data acquisition. With buffer chaining, a channel
interrupts the CPU and is programmed with the next address and count
parameters while DMA transfers are being performed on the current buffer.
Some DMA controllers minimize CPU intervention further by having a chain
address register that points to a chain control table in memory. The DMA
controller then loads its own channel parameters from memory. Generally,
the more sophisticated the DMA controller, the less servicing the CPU has
to perform.
A DMA controller has one or more status registers that are read by the CPU
to determine the state of each DMA channel. The status register typically
indicates whether a DMA request is asserted on a channel and whether a
channel has reached TC. Reading the status register often clears the
terminal count information in the register, which leads to problems when
multiple programs are trying to use different DMA channels.
Advantages
DMA has several advantages over polling and interrupts. DMA is fast
because a dedicated piece of hardware transfers data from one computer
location to another and only one or two bus read/write cycles are required
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 310
per piece of data transferred. In addition, DMA is usually required to achieve
maximum data transfer speed, and thus is useful for high speed data
acquisition devices. DMA also minimizes latency in servicing a data
acquisition device because the dedicated hardware responds more quickly
than interrupts and transfer time is short. Minimizing latency reduces the
amount of temporary storage (memory) required on an I/O device. DMA also
off-loads the processor, which means the processor does not have to
execute any instructions to transfer data. Therefore, the processor is not
used for handling the data transfer activity and is available for other
processing activity. Also, in systems where the processor primarily operates
out of its cache, data transfer is actually occurring in parallel, thus increasing
overall system utilization.
12.4 Synchronization Requirements for DMA and Interrupts
Many times software designers have to work with data structures that are
shared with interrupts or DMA devices. This requires performing atomic
updates to the shared critical regions.
Synchronization with Interrupts
When a data structure is shared with an ISR, disabling the interrupt to
execute the critical region updates is a good technique. Keep in mind that
disabling of interrupts should be restricted to only the code that updates the
critical region. Keeping the interrupts disabled for a long time will increase
the interrupt latency.
Another option is to make use of the fact that interrupts are processed at
instruction boundaries. A single instruction that performs read as well as
write could be used to perform an atomic transaction. For example, if your
processor supports direct memory increment, you could increment a shared
semaphore without disabling interrupts.
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 311
Synchronization with DMA
Sharing data structures with a DMA device is tricky. The processor can
initiate a DMA operation at a bus cycle boundary. This means that a new
DMA operation can be started in the middle of an instruction execution
(Keep in mind that an instruction execution involves multiple bus cycles).
The best mechanism to perform critical region updates is to use the read-
modify-write bus cycle. With this instruction, atomic updates can be made to
critical regions as the read and write are glued together in a special bus
cycle.
Another option is to disable DMA operation. Extreme caution should be
used when employing these techniques. Some processors also support
disabling DMA operations by using locked bus cycles. The processor could
execute lock instruction to disable external bus grants. When critical region
updates have been completed, the unlock instruction is used to allow bus
grants.
Another mechanism to prevent DMA might be to temporarily disable the
device that will perform DMA. For example, if the DMA operations are being
performed by an Ethernet controller, disabling the Ethernet controller will
make sure no DMA operations are started when a critical region update is
being made.
12.5 Summary
DMA is a key element of a computer system. An I/O module has two major
functions: first one to interface with the CPU and memory via the system
bus or central switch and second to interface with one or more peripheral
devices by tailored data links. Basic I/O operations and Programmed I/O,
Interrupt-driven I/O, Direct memory access (DMA) techniques are also
discussed.
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 312
Programmed I/O does not interrupt the processor. The processor must
periodically check the status of the I/O module until the task is completed.
Showing how characters are transferred between the processor and
keyboard and display devices. Interrupt-driven I/O allows the processor to
do other tasks while waiting for the I/O device to send an interrupt. Interrupt-
driven I/O is more efficient than Programmed I/O. Direct memory access
(DMA) allows a separate system bus or I/O module to directly perform a
data transfer while freeing the processor to perform tasks. DMA sends an
interrupt to the processor only when the task is done.
Finally we have seen the synchronization requirements for DMA and
interrupts also.
Self Assessment Questions
1. In the ISA bus standard, up to ____________ of memory can be
addressed for DMA.
2. In ____________ mechanism, CPU is suspended twice.
3. A ____________ is a device, usually peripheral to a computer’s CPU
that is programmed to perform a sequence of data transfers on behalf of
the CPU.
4. Channels must be enabled by the ____________ for the DMA controller
to respond to DMA requests.
5. Keeping the interrupts disabled for a long time will increase the
____________.
12.6 Terminal Questions
1. Discuss in detail the functions and operations of DMA.
2. Explain the operation of DMA controller.
3. Explain the synchronization requirements for DMA and interrupts.
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 313
12.7 Answers
Self Assessment Questions:
1. 16 megabytes
2. single bus detached DMA
3. DMA controller
4. processor
5. interrupt latency
Terminal Questions:
1. Refer Section 12.2
2. Refer Section 12.3
3. Refer Section 12.4
Computer Organization and Architecture Unit 12
Sikkim Manipal University - DDE Page No. 314
References:
1. Computer System Design & Architecture, H. Jordan, PHI
2. William Stalling,” Computer Organization and Architecture", Prentice
Hall/ Person Education Asia
3. John P. Hayes, “Computer Architecture and Organization", McGraw Hill,
4. Tannenbaum, "Computer Organization", PHI
5. V. Carl Hamacher and Zaky, "Computer Organization", McGraw Hill.
6. Thomas C. Bartee, “Computer Architecture and Logic Design", Tata
McGraw Hill.
7. Moris Mano, "Computer System Architecture", Prentice Hall of India,
Second Edition
8. Advanced Computer, Architecture Parallelism, Scalability and
Programmability, Kai Hwang, PHI
––––––––––––––––––––