Computer Organization & Programming Chapter 8 Memory Hierarchy.
Chapter 5 Computer Organization
Transcript of Chapter 5 Computer Organization
Chapter 5Computer Organization
We now study how a computer works as a
connected system of functional units to get
things done. Although there are many differ-
ent computers, most of them share the same
Von Neumann Architecture, with the following
characteristics:
1. A computer consists of four functional units:
memory, input-output, arithmetic-logic unit, and
a control unit.
1
What about the other stuff?
2. The stored program concept
All the instructions that will be executed by
the computer are represented as binary values
and stored inside the memory, together with
the data, also in binary form.
3. The sequential execution of a program
Starting with the first one in a logical sequence,
each instruction is fetched at a time from the
memory to the control unit, where it is de-
coded and executed.
The same process then goes to the next in-
struction in the logical sequence, and will keep
on doing this until all the instructions are exe-
cuted.
Thus, the whole process is done through a
loop.
2
Memory
Memory is the functional unit which stores and
receives the instructions and the data being
executed.
MAR holds on the address of the next instruc-
tion to be executed, or a piece of data that
will be used. IR (Instruction register) and MDR
hold the instruction and the data, respectively.
3
What is a RAM?
All information, both instruction and data, stored
in memory are represented internally as binary
numbers, i.e., the two bits part.
A RAM (Random Access Technique) based com-
puter memory, consisting of a combination of
the three gates, has the following three char-
acteristics:
1) It is divided into fixed-size units called cells,
each of which is uniquely identified by using an
address;
2) The cell is the minimum unit of access, i.e.,
you cannot get a smaller amount out of the
memory, in other words,data is atomic.
3) The time it takes to access information is
the same for all the cells. It is this feature that
gives RAM its name.
4
How big is your memory?
As each cell is identified with an N−bit ad-
dress, there are 2N such cells (Cf. Page 7 of
Chapter 4 notes), and the range of such ad-
dresses is 0 . . .2N− 1. Hence, 2N is called the
maximum memory size. Typical values for Nare 16, 24, 32, and going towards 64.
To easily memorize them, we use KB, MB, GB,
TB, and PB to refer to 210,220,230,240 and
250. Thus, we have the following:
210 = 1K(= 1,0240)
220 = 1M(= 1,048,576)
230 = 1G(= 1,073,741,824)
240 = 1T(= 1,099,511,627,776)
250 = 1P (= 1,125,899,906,842,624)
Thus, a computer with 216 bytes of memory is
said to have 64KB of memory, and a computer
with 232 bytes of memory would be said to
have 4GB of memory.
Homework: Exercises 2 and 3
5
What to do with memory?
There are two basic memory operations, and
both of them apply to the entire cell.
1. Fetch (address): This operation fetches a
copy of the content of the cell with the address
and returns it as the result, without changing
the content of the cell. Thus, it is called a
non-destructive fetch.
2. Store (address, value): This operation stores
the specified value into the cell with the spec-
ified address, which destroys the original con-
tent of that cell.
Memory access time means how long it takes
the computer to finish either a fetch or store
operation. It is typically about 5 to 10 nanosec-
onds, denoted by nsec (= 10−9 second)
6
How to do it?
To complete those two operations, we need
two operands, an address and a value, kept in
two memory registers.
The memory address register (MAR) holds the
address of the cell to be worked on, while the
memory data register (MDR) (Cf. Page 3)
contains the data value being fetched or stored.
Recall decoder is discussed in Chapter 4.
Fetch(address)
Load the address into MAR.
Decode the address in the MAR.
Copy the content of that cell into MDR.
Store(address, value)
Load the address into MAR.
Load the value into MDR.
Decode the address in MAR.
Store the content of MDR into that cell.
7
Decode the address
Given an address, we need to specify a cell
among 2N cells and work with it. We can use
a decoder, as discussed on Page 53 and 54 of
the previous chapter.
Below shows such a circuit that provides 16
addressable cells. The general case is quite
similar.
We can choose with the selection lines where
data will be retrieved. For example, if MAR
contains 0010, only that address will be acti-
vated, and all the others will be “locked”.
Question: Will this mechanism work?
8
A general solution
It seems that such a circuit might not be gen-
eralized too far. For example, if we use a
memory with 216 cells, the decoder must have
65,536 output lines. This number is tough to
manage when we use an even bigger memory.
/
This problem can be solved by organizing the
memory into a 2-d structure, rather than a
linear list. Following shows how to choose the
cell with its address being 0010.
In such a structure, cells are stored in the order
of row major, i.e., rows with smaller indices will
be filled in before rows with a bigger indices.
9
Two-D memory
With this kind of memory equipped with two
decoders, each cell will be associated with two
groups of select line, one connected to the
rows, the other with the columns.
When we send a signal down both row and
column select lines, only the cell sitting in the
intersection of the selected row and column
will be selected.
Recall that the address bit string goes from
the right end to the left, if the MAR contains
0010, then the two higher-order bits 00 will be
sent to the row decoder, while the two lower
bits, 10, sent to the column one. Then the
cell with address 0010 will be selected.
This alternative memory structure is thus much
more efficient to work with. ,
Homework: Exercise 6
10
Cache memory
When the processor needs a piece of data,
it simply goes out to the memory and fetch
it. As the processor gets faster and faster, it
waits longer and longer for the data. Thus,
the memory access becomes a bottleneck, as
it works much slower than a processor. /
We can certainly try to increase the speed
and/or size of the memory. But, at that time,
it was far too expensive.
It was soon discovered that when a program
fetches a piece of data, it is quite likely that the
same thing will be needed in the near future.
This phenomenon is referred to as the locality
principle.
Hence, once a piece of data is used, it can be
moved over to a cache memory, a smaller, but
much faster, memory unit, so that later on,
when this piece is needed again, the processor
can go directly to the Cache.
11
How memory really works?
When the computer needs a piece of informa-
tion, it goes through the following steps.
1. Look first in the Cache to see if the in-
formation is already there. If it is there, the
processor gets it, and it is done.
2. If that information is not in the Cache mem-
ory, it then follows the usual fetch process, to
get it from the RAM.
3. Copy the data into the Cache memory. If
the Cache is full, discard some of the material
that has not recently accessed.
Assume that the access time for Cache and
RAM are 2 and 10 nsecs (Cf. Page 6), respec-
tively, and further assume that the information
is in the Cache 70% of the time, then the over-
all average access time is 5 (= 0.7× 2 + 0.3×
(2 + 10)) nsecs, which is a 50% reduction of
the usual access time. ,
12
The memory structure
MAR specifies where, while MDR says what.
We will talk a lot more about the locality prin-
ciple, Cache, and other memory related issues,
in CS 2220 Computer Hardware and CS 4310
Operating System.
13
I/O and mass storage
The input-output units are the devices that al-
low a computer to communicate with the out-
side world, as well as store information inside
the computer.
Although there are many different I/O mecha-
nisms, there are two invariant important princi-
ples: I/O access methods and I/O controllers.
Input/output devices come in two different ba-
sic types, those dealing with human being and
those dealing with machines. For the former,
we have such well-known devices as keyboard,
mouse, printer, screen, etc. For the latter, re-
ferred to as mass storage system, includes hard
disks, DVD-ROM, flash drive and tapes.
We also use cloud to keep lots of data today.
14
I/O controller
A fundamental feature of many I/O devices isthat they are very slow, a million times slower,compared to other components./
For example, a hard disk is cut into lots oftracks, and each track is further cut into lots ofsectors. A R/W heads has to move in and out,mechanically, to read/write data from/into asector, which could hold 1 KB of data.
Question: Do you listen to music with LP?
15
How to play a LP record?
This is what a LP record looks like.
Question: How to play with it?
16
What time?
The access time to a sector of a disk drive
consists of three parts: seek time, latency and
transfer time.
Seek time is the time needed to position the
R/W head over the current track, after me-
chanically moving in and out the arm.
Latency is the time for the beginning of that
track to rotate under the R/W head, once it
is positioned on that track, until it finds the
right sector.
Both the seek time and the latency time are
associated with some mechanical movement,
thus slow. /
When the R/W head gets to the right spot,
we could transfer the data by either reading it
from a sector, or writing the data there.
17
How much time does it take?
It could take 0.02 msec (milisecond, or 1/1000
of a second) to move the R/W arm from a
track to one next to it.
Thus, if we are right there, we could spend
0 msec; if we have to go through all the way
from track 0 to track 999, we have to spend
19.98 (= 999×0.02) msec in seek time. If, on
average, the arm has to move over 300 tracks,
it takes on average 6 msec for this part.
Rotation speed could be 7200 rev/minute, i.e.,
120 rev/sec. Thus, it takes 1/120 sec (=8.33
msec) to finish one revolution.
Thus, the best of the latency time is 0, and
the worst is 8.33 msec, and on average, we
rotate half of a circle, taking 4.17 msec.
18
Regarding the transfer time, for each of the 64
sectors in a track, it takes 1/64 × 8.33 msec
=0.13 msec to read in a sector containing 1
KB (=1,024 Bytes). This holds true all the
time.
All the above analysis of the access time, in
terms of msec, i.e., miliseconds, for this disk
to get 1KB (=1,024 Bytes) data can be sum-
marized as follows:
Best Worst AverageSeek time 0 19.98 6Latency 0 8.33 4.17Transfer 0.13 0.13 0.13Total 0.13 msec 28.44 msec 10.13 msec
It thus takes much longer, 106 times slower /,
to get a piece of data in or out of a disk, as
compared with memory (Cf. Page 6).
Homework: Check out the Practice Problem
on Page 244, then complete Exercise 12. No-
tice there are just 20 sectors per track, instead
of 64.
19
ALU
The ALU (Arithmetic and Logic Unit) is to get
things done. It contains circuits for arithmetic
addition, subtraction, multiplication, and di-
vision; as well as circuits for comparison and
logic equality, again in terms of the three gates
that we studied in the last chapter: AND, OR,
and NOT.
It also contains registers, which are high-speed,
dedicated memory cells connected to circuits.
All these are connected with data paths, which
let data flow in between registers and circuits.
20
How about a more general one?
There are certainly lots of registers in an ALU,
as shown below.
This one holds 16 registers, R0 through R15.
Any of them can hold operands of some op-
eration, as well as its result: typically two in,
and one result out, sent back to a register.
21
A little more on registers
A register can hold the operand of an arith-
metic operation or the result of an operation.
Registers are quite similar to the RAM cells,
but instead of using a numeric address, e.g.,
0011, we get access to them by a special reg-
ister designator, such as A, X, or R0.
Registers can be accessed much more quickly
than regular RAM cells, because they are lo-
cated inside a processor; and they are used only
for specific purposes, because we have only this
few. /
A typical processor has between 12 to 24 reg-
isters. The more it has, the faster a program
runs. ,
22
The control unit
The control unit is to 1) fetch from mem-
ory the next instruction to be executed; Belowshows the typical format of such an instruction.
2) decode the instruction to determine what isto be done with the OpCode, and;
3) execute it by issuing the appropriate com-
mands to the processor (ALU), memory andI/O devices.
Starting withe first instruction, this process isrepeated until the last one is executed. (We
mentioned this on Page 2, and will present
more details on Pages 35 and 36.)
The collections of all the instructions that can
be decoded and executed by the control unit
are called machine language.
23
CU registers
To fetch and execute instructions, the con-
trol unit uses two special purpose registers, the
program counter (PC), and the instruction reg-
ister (IR) together with an instruction decoder
circuitry.
The program counter holds the address of the
next instruction to be executed.
To get that instruction, the CU sends the con-
tent of PC to the MAR and executes a fetch
operation and put it, the fetched instruction
itself, into the instruction register (IR).
The controller will then try to figure out what
to do, and then get it done.
24
What to do?
The operation code (OpCode) part of the IR is
sent to an Instruction decoder to further de-
termine what is to be done.
If the OP code field has k bits, there will be at
most 2k operations.
For each of these operations, this ALU will
carry out the corresponding operation, and the
above decoded information is further fed into
the Select lines of the ALU to choose the de-
sired result, among a bunch.
Question: Can you tell me all the details?
25
Here it goes...
After data are flowing into an ALU, it will 1)
do everything, such as addition, subtraction,
multiplication, division, AND, OR, NOT, com-
parison, etc..
It then 2) uses a multiplexer (Cf. Pages 51 and
52 of Chapter 4 notes) to choose the desired
result according with the information as sent
over by the CU decoder, and send it out to a
register.
For example, a value 00 of the OpCode will en-
able and select the output of the adder through
the above 2 × 4 multiplexer.
26
This is the computer...
... that we will use to get things done.
All the parts are hooked with a bus.
We will talk about GT, EQ and LT on Page 31.
27
What about a program?
A machine language program consists of in-structions each contains an operation code andaddresses of its associated data (Page 22).
For example, the instruction Add X, Y, whichmeans to add, with its code bring 9 (1001),two values stored in X, located in 99, and Y,located in 100, and then put back the result incell Y, might look like the following:
00001001 0000000001100011 0000000001100100
In general, the operation code contains a uniqueinteger assigned to each operation, which canthen be recognized by the hardware.
The address code fields contain the addressesof the values that this operation will work with.There are usually between 0 and 3 such fields,since after getting two values, we may want tospecify where the result is to be placed.
Homework: Exercise 10
28
It all depends...
The set of all the operations executable by a
processor is called its instruction set. There is
no standard as what should be included in this
set. /
Operations, and indeed, machine languages,
vary from machine to machine. That is why
iPhone 13, using A15 processor, cannot directly
execute a program written for Intel Core i7.
The machine languages on most computers are
quite elementary, each of them carries out very
small, specific, and simple task.
The power of a computer does not lie in what
it can do, but how quickly it can.
We will correct the above statement in CS 3780
Intro. to Computational Theory. ,
29
More or less?
The trend is to keep the set as small and simple
as possible. These machines are called reduced
instruction set computers (RISC), with just 30
to 50 different instructions.
This can minimize the hardware circuitry needed
to build a computer. A program for a RISC
machine may require more instructions, but
this is made up by the fact that they can be
executed much faster.
In 1970’s, a typical processor might have from
300 to 500 machine language instructions, thus
called CISC (Complex instruction set comput-
ers.) They are more complex, expensive, and
more difficult to construct. Most modern pro-
cessors follow a mixture of these two approaches.
Check out the piece on the course page.
30
Instruction categories
Machine language instructions can be catego-
rized into four basic classes: data transfer,
arithmetic, compare, and branch.
1. Data transfer instructions move information
between, or within, the different components
of the computer, e.g., memory cells, ALU, and
registers.
For example, LOAD X means load register R with
CON(X), i.e., the content of a memory cell X;
and STORE X is just the opposite. For example
000000000000 000000000101
loads the content of location 5 into the register
R.
MOVE X Y is to move the CON(X) to memory cell
with address being Y.
31
2. Arithmetic instructions apply various oper-
ations on data.
For example, ADD X, Y, Z means to fill cell Z
with the sum of CON(X) and CON(Y); ADD X, Y
is to fill Y with the sum of CON(X) and CON(Y),
and ADD X is to fill R with the sum of CON(X)
and that of R.
3. Compare instructions compare two values
and set an indicator accordingly.
For example, COMPARE X Y is to compare CON(X)
and CON(Y), then set three condition codes, LT,
EQ, GT, accordingly.
If CON(X) > CON(Y), then GT=1, EQ=0, LT=0.
If CON(X) = CON(Y), then GT=0, EQ=1, LT=0.
If CON(X) < CON(Y), then GT=0, EQ=0, LT=1.
32
4. The general structure of the Von Neu-
mann machine is sequential (Cf, Page 2). The
branch instructions change this normal flow of
control, based on the result of a condition test-
ing.
For example, JUMP X will take the next instruc-
tion located at X, or jump to X no matter what.
JUMPGT X will do a jump if the code GT is set to
1.
JUMPGE X will do a jump if either the code GT=1
or EQ=1.
JUMPLT, JUMPEQ, JUMPLE and JUMPNEQ function sim-
ilarly.
HALT indicates the end of the execution of the
program.
We are ready to look at some examples.
33
Examples
Check out the following classic examples.
If shows an algorithm, a program in Python,
and one in a machine language.
We want to use algorithm, but have to use a
program, e.g., in Python, which we will scratch
its surface later, and a lot more in CS2370:
Intro. to Programming.
34
Fundamentals of Computing
Our Von Neumann machine, with a four-bit
OPcode, can run sixteen instructions.
In Lab 8, we will witness the actual process of
executing a program, in machine language, in
such a computer, by combining all the pieces
that we have talked about so far.
35
Algorithmically speaking
The execution of a program can be summa-
rized as follows, previewed on Page 23:
Repeat until a HALT instruction or an error
Fetch phase
Decode phase
Execute phase
Fetch refers to the following:
PC -> MAR Send the address of the next
instruction to MAR
FETCH Fill MDR with the content of
the cell whose address in MAR
MDR -> IR Fill IR with the content of MDR
PC+1 -> PC Increase PC by 1, the address
of the next instruction
36
Decode refers to the following:
IR(OPCode) -> instruction decoder
The opcode part of the instruction is sent to
the decoder, which is to activate the circuitry
to carry out this instruction.
Execute depends on the nature of the instruc-
tion. For Add X, i.e., add the content of cell X
and that in a register R, and put the sum back
to R.
IR(addre) -> MAR The address of the operand
goes to MAR
Fetch The operand goes to MDR
MDR -> ALU The operand goes to ALU
R -> ALU The other value is also
moved from R to ALU
ADD Add them up
ALU -> R The sum goes back to R
Assignment: Read through the rest of §5.3
37
What have we learned?
We saw, on Page 28, to sum up two numbers
kept in cells 99 and 100, we need to use the
following instruction:
00001001 0000000001100011 0000000001100100
You will work out the details of executing such
an instruction in Lab 8.
ADD X Y
It is way too tough for us to work with this
stuff. /
Later on, we will see some more user friendly
way of doing our work, e.g.,
Y = X + Y;
We will also see how to translate a program
that we want to work with to an equivalent
one that a computer needs to work with.
38
The future
The problems that computers are asked to solve
grow significantly in size and complexity. So
far, we are able to keep up with the demand by
building faster and faster von Neumman ma-
chines.
The first generation can execute about 10,000
instructions per second. Today, even a PC can
do 1 MIPS (Million Instructions Per Second),
and larger ones can do 10 to 20 BIPS. How-
ever, such a growth is slowing down. /
Due to the limit of putting more and more
gates even closer on a chip, the famous Moore’s
law is stopping, and we now have this von Neu-
mann bottleneck.
One way to overcome this problem is to use
parallel processing architecture with multiple
processors. For example, the A15 processor
inside iPhone 13 has six processing cores.
39
Various types
Another type is called MIMD, also called clus-ter computing, where we apply different in-structions to different sets of data. All theseprocessors communicate with teach other withan interconnection network.
For example, when looking for a number in aphone book, we can divide the book into manypieces and assign each piece to one processor.Then, the searching can be done in parallelwith a linear speed-up.
We will talk a lot more about computer ar-chitecture in CS 4250 Computer Architecture,
and parallel programming in CS 3221 AlgorithmAnalysis and CS4310 Operating Systems.
40
A couple of points
1. When making use of a MIMD architecture,
we have to make all the processors as busy
as possible, otherwise, we are wasting our pre-
cious resources. /
2. We also have to minimize the amount of
inter-processor communication, since if a pro-
cessor talks too much, it won’t have much time
to work. This is easy in the aforementioned
searching problem, but turns out to be diffi-
cult in general. /
3. An effort has been made to build non-
von Neumann type of computers: If you can-
not build something that will work twice as
faster, build something that can do two things
at once. ,
Assignment: Read the rest of Section 5.4.
Lab time: Lab 8 on computer system with von
Neumann Machine
41