Chapter 5 Computer Organization

Chapter 5Computer Organization

We now study how a computer works as a

connected system of functional units to get

things done. Although there are many differ-

ent computers, most of them share the same

Von Neumann Architecture, with the following

characteristics:

1. A computer consists of four functional units:

memory, input-output, arithmetic-logic unit, and

a control unit.

1

What about the other stuff?

2. The stored program concept

All the instructions that will be executed by

the computer are represented as binary values

and stored inside the memory, together with

the data, also in binary form.

3. The sequential execution of a program

Starting with the first one in a logical sequence,

each instruction is fetched at a time from the

memory to the control unit, where it is de-

coded and executed.

The same process then goes to the next in-

struction in the logical sequence, and will keep

on doing this until all the instructions are exe-

cuted.

Thus, the whole process is done through a

loop.

2

Memory

Memory is the functional unit which stores and

receives the instructions and the data being

executed.

MAR holds on the address of the next instruc-

tion to be executed, or a piece of data that

will be used. IR (Instruction register) and MDR

hold the instruction and the data, respectively.

3

What is a RAM?

All information, both instruction and data, stored

in memory are represented internally as binary

numbers, i.e., the two bits part.

A RAM (Random Access Technique) based com-

puter memory, consisting of a combination of

the three gates, has the following three char-

acteristics:

1) It is divided into fixed-size units called cells,

each of which is uniquely identified by using an

address;

2) The cell is the minimum unit of access, i.e.,

you cannot get a smaller amount out of the

memory, in other words,data is atomic.

3) The time it takes to access information is

the same for all the cells. It is this feature that

gives RAM its name.

4

How big is your memory?

As each cell is identified with an N−bit ad-

dress, there are 2N such cells (Cf. Page 7 of

Chapter 4 notes), and the range of such ad-

dresses is 0 . . .2N− 1. Hence, 2N is called the

maximum memory size. Typical values for Nare 16, 24, 32, and going towards 64.

To easily memorize them, we use KB, MB, GB,

TB, and PB to refer to 210,220,230,240 and

250. Thus, we have the following:

210 = 1K(= 1,0240)

220 = 1M(= 1,048,576)

230 = 1G(= 1,073,741,824)

240 = 1T(= 1,099,511,627,776)

250 = 1P (= 1,125,899,906,842,624)

Thus, a computer with 216 bytes of memory is

said to have 64KB of memory, and a computer

with 232 bytes of memory would be said to

have 4GB of memory.

Homework: Exercises 2 and 3

5

What to do with memory?

There are two basic memory operations, and

both of them apply to the entire cell.

1. Fetch (address): This operation fetches a

copy of the content of the cell with the address

and returns it as the result, without changing

the content of the cell. Thus, it is called a

non-destructive fetch.

2. Store (address, value): This operation stores

the specified value into the cell with the spec-

ified address, which destroys the original con-

tent of that cell.

Memory access time means how long it takes

the computer to finish either a fetch or store

operation. It is typically about 5 to 10 nanosec-

onds, denoted by nsec (= 10−9 second)

6

How to do it?

To complete those two operations, we need

two operands, an address and a value, kept in

two memory registers.

The memory address register (MAR) holds the

address of the cell to be worked on, while the

memory data register (MDR) (Cf. Page 3)

contains the data value being fetched or stored.

Recall decoder is discussed in Chapter 4.

Fetch(address)

Load the address into MAR.

Decode the address in the MAR.

Copy the content of that cell into MDR.

Store(address, value)

Load the address into MAR.

Load the value into MDR.

Decode the address in MAR.

Store the content of MDR into that cell.

7

Decode the address

Given an address, we need to specify a cell

among 2N cells and work with it. We can use

a decoder, as discussed on Page 53 and 54 of

the previous chapter.

Below shows such a circuit that provides 16

addressable cells. The general case is quite

similar.

We can choose with the selection lines where

data will be retrieved. For example, if MAR

contains 0010, only that address will be acti-

vated, and all the others will be “locked”.

Question: Will this mechanism work?

8

A general solution

It seems that such a circuit might not be gen-

eralized too far. For example, if we use a

memory with 216 cells, the decoder must have

65,536 output lines. This number is tough to

manage when we use an even bigger memory.

/

This problem can be solved by organizing the

memory into a 2-d structure, rather than a

linear list. Following shows how to choose the

cell with its address being 0010.

In such a structure, cells are stored in the order

of row major, i.e., rows with smaller indices will

be filled in before rows with a bigger indices.

9

Two-D memory

With this kind of memory equipped with two

decoders, each cell will be associated with two

groups of select line, one connected to the

rows, the other with the columns.

When we send a signal down both row and

column select lines, only the cell sitting in the

intersection of the selected row and column

will be selected.

Recall that the address bit string goes from

the right end to the left, if the MAR contains

0010, then the two higher-order bits 00 will be

sent to the row decoder, while the two lower

bits, 10, sent to the column one. Then the

cell with address 0010 will be selected.

This alternative memory structure is thus much

more efficient to work with. ,

Homework: Exercise 6

10

Cache memory

When the processor needs a piece of data,

it simply goes out to the memory and fetch

it. As the processor gets faster and faster, it

waits longer and longer for the data. Thus,

the memory access becomes a bottleneck, as

it works much slower than a processor. /

We can certainly try to increase the speed

and/or size of the memory. But, at that time,

it was far too expensive.

It was soon discovered that when a program

fetches a piece of data, it is quite likely that the

same thing will be needed in the near future.

This phenomenon is referred to as the locality

principle.

Hence, once a piece of data is used, it can be

moved over to a cache memory, a smaller, but

much faster, memory unit, so that later on,

when this piece is needed again, the processor

can go directly to the Cache.

11

How memory really works?

When the computer needs a piece of informa-

tion, it goes through the following steps.

1. Look first in the Cache to see if the in-

formation is already there. If it is there, the

processor gets it, and it is done.

2. If that information is not in the Cache mem-

ory, it then follows the usual fetch process, to

get it from the RAM.

3. Copy the data into the Cache memory. If

the Cache is full, discard some of the material

that has not recently accessed.

Assume that the access time for Cache and

RAM are 2 and 10 nsecs (Cf. Page 6), respec-

tively, and further assume that the information

is in the Cache 70% of the time, then the over-

all average access time is 5 (= 0.7× 2 + 0.3×

(2 + 10)) nsecs, which is a 50% reduction of

the usual access time. ,

12

The memory structure

MAR specifies where, while MDR says what.

We will talk a lot more about the locality prin-

ciple, Cache, and other memory related issues,

in CS 2220 Computer Hardware and CS 4310

Operating System.

13

I/O and mass storage

The input-output units are the devices that al-

low a computer to communicate with the out-

side world, as well as store information inside

the computer.

Although there are many different I/O mecha-

nisms, there are two invariant important princi-

ples: I/O access methods and I/O controllers.

Input/output devices come in two different ba-

sic types, those dealing with human being and

those dealing with machines. For the former,

we have such well-known devices as keyboard,

mouse, printer, screen, etc. For the latter, re-

ferred to as mass storage system, includes hard

disks, DVD-ROM, flash drive and tapes.

We also use cloud to keep lots of data today.

14

I/O controller

A fundamental feature of many I/O devices isthat they are very slow, a million times slower,compared to other components./

For example, a hard disk is cut into lots oftracks, and each track is further cut into lots ofsectors. A R/W heads has to move in and out,mechanically, to read/write data from/into asector, which could hold 1 KB of data.

Question: Do you listen to music with LP?

15

How to play a LP record?

This is what a LP record looks like.

Question: How to play with it?

16

What time?

The access time to a sector of a disk drive

consists of three parts: seek time, latency and

transfer time.

Seek time is the time needed to position the

R/W head over the current track, after me-

chanically moving in and out the arm.

Latency is the time for the beginning of that

track to rotate under the R/W head, once it

is positioned on that track, until it finds the

right sector.

Both the seek time and the latency time are

associated with some mechanical movement,

thus slow. /

When the R/W head gets to the right spot,

we could transfer the data by either reading it

from a sector, or writing the data there.

17

How much time does it take?

It could take 0.02 msec (milisecond, or 1/1000

of a second) to move the R/W arm from a

track to one next to it.

Thus, if we are right there, we could spend

0 msec; if we have to go through all the way

from track 0 to track 999, we have to spend

19.98 (= 999×0.02) msec in seek time. If, on

average, the arm has to move over 300 tracks,

it takes on average 6 msec for this part.

Rotation speed could be 7200 rev/minute, i.e.,

120 rev/sec. Thus, it takes 1/120 sec (=8.33

msec) to finish one revolution.

Thus, the best of the latency time is 0, and

the worst is 8.33 msec, and on average, we

rotate half of a circle, taking 4.17 msec.

18

Regarding the transfer time, for each of the 64

sectors in a track, it takes 1/64 × 8.33 msec

=0.13 msec to read in a sector containing 1

KB (=1,024 Bytes). This holds true all the

time.

All the above analysis of the access time, in

terms of msec, i.e., miliseconds, for this disk

to get 1KB (=1,024 Bytes) data can be sum-

marized as follows:

Best Worst AverageSeek time 0 19.98 6Latency 0 8.33 4.17Transfer 0.13 0.13 0.13Total 0.13 msec 28.44 msec 10.13 msec

It thus takes much longer, 106 times slower /,

to get a piece of data in or out of a disk, as

compared with memory (Cf. Page 6).

Homework: Check out the Practice Problem

on Page 244, then complete Exercise 12. No-

tice there are just 20 sectors per track, instead

of 64.

19

ALU

The ALU (Arithmetic and Logic Unit) is to get

things done. It contains circuits for arithmetic

addition, subtraction, multiplication, and di-

vision; as well as circuits for comparison and

logic equality, again in terms of the three gates

that we studied in the last chapter: AND, OR,

and NOT.

It also contains registers, which are high-speed,

dedicated memory cells connected to circuits.

All these are connected with data paths, which

let data flow in between registers and circuits.

20

How about a more general one?

There are certainly lots of registers in an ALU,

as shown below.

This one holds 16 registers, R0 through R15.

Any of them can hold operands of some op-

eration, as well as its result: typically two in,

and one result out, sent back to a register.

21

A little more on registers

A register can hold the operand of an arith-

metic operation or the result of an operation.

Registers are quite similar to the RAM cells,

but instead of using a numeric address, e.g.,

0011, we get access to them by a special reg-

ister designator, such as A, X, or R0.

Registers can be accessed much more quickly

than regular RAM cells, because they are lo-

cated inside a processor; and they are used only

for specific purposes, because we have only this

few. /

A typical processor has between 12 to 24 reg-

isters. The more it has, the faster a program

runs. ,

22

The control unit

The control unit is to 1) fetch from mem-

ory the next instruction to be executed; Belowshows the typical format of such an instruction.

2) decode the instruction to determine what isto be done with the OpCode, and;

3) execute it by issuing the appropriate com-

mands to the processor (ALU), memory andI/O devices.

Starting withe first instruction, this process isrepeated until the last one is executed. (We

mentioned this on Page 2, and will present

more details on Pages 35 and 36.)

The collections of all the instructions that can

be decoded and executed by the control unit

are called machine language.

23

CU registers

To fetch and execute instructions, the con-

trol unit uses two special purpose registers, the

program counter (PC), and the instruction reg-

ister (IR) together with an instruction decoder

circuitry.

The program counter holds the address of the

next instruction to be executed.

To get that instruction, the CU sends the con-

tent of PC to the MAR and executes a fetch

operation and put it, the fetched instruction

itself, into the instruction register (IR).

The controller will then try to figure out what

to do, and then get it done.

24

What to do?

The operation code (OpCode) part of the IR is

sent to an Instruction decoder to further de-

termine what is to be done.

If the OP code field has k bits, there will be at

most 2k operations.

For each of these operations, this ALU will

carry out the corresponding operation, and the

above decoded information is further fed into

the Select lines of the ALU to choose the de-

sired result, among a bunch.

Question: Can you tell me all the details?

25

Here it goes...

After data are flowing into an ALU, it will 1)

do everything, such as addition, subtraction,

multiplication, division, AND, OR, NOT, com-

parison, etc..

It then 2) uses a multiplexer (Cf. Pages 51 and

52 of Chapter 4 notes) to choose the desired

result according with the information as sent

over by the CU decoder, and send it out to a

register.

For example, a value 00 of the OpCode will en-

able and select the output of the adder through

the above 2 × 4 multiplexer.

26

This is the computer...

... that we will use to get things done.

All the parts are hooked with a bus.

We will talk about GT, EQ and LT on Page 31.

27

What about a program?

A machine language program consists of in-structions each contains an operation code andaddresses of its associated data (Page 22).

For example, the instruction Add X, Y, whichmeans to add, with its code bring 9 (1001),two values stored in X, located in 99, and Y,located in 100, and then put back the result incell Y, might look like the following:

00001001 0000000001100011 0000000001100100

In general, the operation code contains a uniqueinteger assigned to each operation, which canthen be recognized by the hardware.

The address code fields contain the addressesof the values that this operation will work with.There are usually between 0 and 3 such fields,since after getting two values, we may want tospecify where the result is to be placed.

Homework: Exercise 10

28

It all depends...

The set of all the operations executable by a

processor is called its instruction set. There is

no standard as what should be included in this

set. /

Operations, and indeed, machine languages,

vary from machine to machine. That is why

iPhone 13, using A15 processor, cannot directly

execute a program written for Intel Core i7.

The machine languages on most computers are

quite elementary, each of them carries out very

small, specific, and simple task.

The power of a computer does not lie in what

it can do, but how quickly it can.

We will correct the above statement in CS 3780

Intro. to Computational Theory. ,

29

More or less?

The trend is to keep the set as small and simple

as possible. These machines are called reduced

instruction set computers (RISC), with just 30

to 50 different instructions.

This can minimize the hardware circuitry needed

to build a computer. A program for a RISC

machine may require more instructions, but

this is made up by the fact that they can be

executed much faster.

In 1970’s, a typical processor might have from

300 to 500 machine language instructions, thus

called CISC (Complex instruction set comput-

ers.) They are more complex, expensive, and

more difficult to construct. Most modern pro-

cessors follow a mixture of these two approaches.

Check out the piece on the course page.

30

Instruction categories

Machine language instructions can be catego-

rized into four basic classes: data transfer,

arithmetic, compare, and branch.

1. Data transfer instructions move information

between, or within, the different components

of the computer, e.g., memory cells, ALU, and

registers.

For example, LOAD X means load register R with

CON(X), i.e., the content of a memory cell X;

and STORE X is just the opposite. For example

000000000000 000000000101

loads the content of location 5 into the register

R.

MOVE X Y is to move the CON(X) to memory cell

with address being Y.

31

2. Arithmetic instructions apply various oper-

ations on data.

For example, ADD X, Y, Z means to fill cell Z

with the sum of CON(X) and CON(Y); ADD X, Y

is to fill Y with the sum of CON(X) and CON(Y),

and ADD X is to fill R with the sum of CON(X)

and that of R.

3. Compare instructions compare two values

and set an indicator accordingly.

For example, COMPARE X Y is to compare CON(X)

and CON(Y), then set three condition codes, LT,

EQ, GT, accordingly.

If CON(X) > CON(Y), then GT=1, EQ=0, LT=0.

If CON(X) = CON(Y), then GT=0, EQ=1, LT=0.

If CON(X) < CON(Y), then GT=0, EQ=0, LT=1.

32

4. The general structure of the Von Neu-

mann machine is sequential (Cf, Page 2). The

branch instructions change this normal flow of

control, based on the result of a condition test-

ing.

For example, JUMP X will take the next instruc-

tion located at X, or jump to X no matter what.

JUMPGT X will do a jump if the code GT is set to

1.

JUMPGE X will do a jump if either the code GT=1

or EQ=1.

JUMPLT, JUMPEQ, JUMPLE and JUMPNEQ function sim-

ilarly.

HALT indicates the end of the execution of the

program.

We are ready to look at some examples.

33

Examples

Check out the following classic examples.

If shows an algorithm, a program in Python,

and one in a machine language.

We want to use algorithm, but have to use a

program, e.g., in Python, which we will scratch

its surface later, and a lot more in CS2370:

Intro. to Programming.

34

Fundamentals of Computing

Our Von Neumann machine, with a four-bit

OPcode, can run sixteen instructions.

In Lab 8, we will witness the actual process of

executing a program, in machine language, in

such a computer, by combining all the pieces

that we have talked about so far.

35

Algorithmically speaking

The execution of a program can be summa-

rized as follows, previewed on Page 23:

Repeat until a HALT instruction or an error

Fetch phase

Decode phase

Execute phase

Fetch refers to the following:

PC -> MAR Send the address of the next

instruction to MAR

FETCH Fill MDR with the content of

the cell whose address in MAR

MDR -> IR Fill IR with the content of MDR

PC+1 -> PC Increase PC by 1, the address

of the next instruction

36

Decode refers to the following:

IR(OPCode) -> instruction decoder

The opcode part of the instruction is sent to

the decoder, which is to activate the circuitry

to carry out this instruction.

Execute depends on the nature of the instruc-

tion. For Add X, i.e., add the content of cell X

and that in a register R, and put the sum back

to R.

IR(addre) -> MAR The address of the operand

goes to MAR

Fetch The operand goes to MDR

MDR -> ALU The operand goes to ALU

R -> ALU The other value is also

moved from R to ALU

ADD Add them up

ALU -> R The sum goes back to R

Assignment: Read through the rest of §5.3

37

What have we learned?

We saw, on Page 28, to sum up two numbers

kept in cells 99 and 100, we need to use the

following instruction:

00001001 0000000001100011 0000000001100100

You will work out the details of executing such

an instruction in Lab 8.

ADD X Y

It is way too tough for us to work with this

stuff. /

Later on, we will see some more user friendly

way of doing our work, e.g.,

Y = X + Y;

We will also see how to translate a program

that we want to work with to an equivalent

one that a computer needs to work with.

38

The future

The problems that computers are asked to solve

grow significantly in size and complexity. So

far, we are able to keep up with the demand by

building faster and faster von Neumman ma-

chines.

The first generation can execute about 10,000

instructions per second. Today, even a PC can

do 1 MIPS (Million Instructions Per Second),

and larger ones can do 10 to 20 BIPS. How-

ever, such a growth is slowing down. /

Due to the limit of putting more and more

gates even closer on a chip, the famous Moore’s

law is stopping, and we now have this von Neu-

mann bottleneck.

One way to overcome this problem is to use

parallel processing architecture with multiple

processors. For example, the A15 processor

inside iPhone 13 has six processing cores.

39

Various types

Another type is called MIMD, also called clus-ter computing, where we apply different in-structions to different sets of data. All theseprocessors communicate with teach other withan interconnection network.

For example, when looking for a number in aphone book, we can divide the book into manypieces and assign each piece to one processor.Then, the searching can be done in parallelwith a linear speed-up.

We will talk a lot more about computer ar-chitecture in CS 4250 Computer Architecture,

and parallel programming in CS 3221 AlgorithmAnalysis and CS4310 Operating Systems.

40

A couple of points

1. When making use of a MIMD architecture,

we have to make all the processors as busy

as possible, otherwise, we are wasting our pre-

cious resources. /

2. We also have to minimize the amount of

inter-processor communication, since if a pro-

cessor talks too much, it won’t have much time

to work. This is easy in the aforementioned

searching problem, but turns out to be diffi-

cult in general. /

3. An effort has been made to build non-

von Neumann type of computers: If you can-

not build something that will work twice as

faster, build something that can do two things

at once. ,

Assignment: Read the rest of Section 5.4.

Lab time: Lab 8 on computer system with von

Neumann Machine

41

Chapter 5 Computer Organization

Documents

Transcript of Chapter 5 Computer Organization