1
TK6123: COMPUTER ORGANISATION & ARCHITECTURE
Prepared By: Associate Prof. Dr Masri Ayob
Lecture 7: CPU and Memory (2)
Contents
This lecture will discuss:• Memory Unit.• Instruction Execution.• Buses.
Prepared by: Dr Masri Ayob - TK2123
2
Memory Implementations
The most common types of memory: • magnetic core memory• static RAM• dynamic RAM, and• ROM.
Memory can be volatile or nonvolatile. • Nonvolatile memory retains its values when power
is removed. • Volatile memory loses its contents when power is
removed.
Prepared by: Dr Masri Ayob - TK2123
3
Memory Implementations
Magnetic core memory uses a small core of magnetic material to hold a bit of data.• Since magnetism remains after the current is
removed, core memory is nonvolatile.• Magnetic core memory is expensive and slow in
operation compared to other types of memory. It has been replaced almost entirely by RAM.
• It is still used on a few computers where both read and write capability are required and where the loss of data or programs would be severely damaging, particularly for military and space applications.
Prepared by: Dr Masri Ayob - TK2123
4
Memory Implementations
Most current computers use either static or dynamic RAM for memory.
Dynamic RAM (DRAM) is less expensive, requires less electrical power, and can be made smaller, with more bits of storage in a single integrated circuit. • DRAM also requires extra electronic circuitry that
“refreshes” memory periodically; otherwise the data fades away after awhile, and is lost.
Prepared by: Dr Masri Ayob - TK2123
5
Memory Implementations
Static RAM (SRAM) does not require refreshing.
Static RAM is also faster to access than DRAM and is therefore useful in very-high-speed computers and for small amounts of high-speed memory.
But SRAM is more expensive and requires more chips.
Both dynamic and static RAM are volatile.
Currently, the DRAM is the most popular.
Prepared by: Dr Masri Ayob - TK2123
6
Memory Implementations
ROM (read-only memory) is used for situations where the software is built permanently into the computer.
Early ROM memory was made up of integrated circuits with fuses in them that could be blown.
Modern ROM memories use a different technology, which can be erased and rewritten.
Prepared by: Dr Masri Ayob - TK2123
7
Memory Implementations
Within the computer, ROM is both nonvolatile and unwriteable.
The method used to access memory is basically the same, regardless of memory type.
Prepared by: Dr Masri Ayob - TK2123
8
Memory Implementations
EEPROM and Flash ROM are recent memory innovations that implement nonvolatile, writeable memory.• Both allow rewriting by erasing memory cells
selectively, then writing new data into those cells. • Flash ROM is faster and more flexible than
EEPROM because it can erase and write data in blocks, rather than one byte at a time.
• Flash ROM is used in the computer BIOS and in devices, such as digital cameras, that require faster access than a disk can offer.
Prepared by: Dr Masri Ayob - TK2123
9
Primary Memory: Memory Addresses (1)
Three ways of organizing a 96-bit memory.
Prepared by: Dr Masri Ayob - TK2123
10
Instruction Cycle
Two steps:• Fetch• Execute
Prepared by: Dr Masri Ayob - TK2123
11
Fetch/Execute Cycle
Program Counter (PC) holds address of next instruction to fetch. PC MAR
Processor fetches instruction from memory location pointed to by PC.
Increment PC• Unless told otherwise
Prepared by: Dr Masri Ayob - TK2123
12
Fetch/Execute Cycle
Instruction loaded into Instruction Register (IR). MDRIR.
Processor interprets instruction and performs required actions.• If instructions uses word in memory, fetch the
word into CPU register.
Execute the instruction.
Prepared by: Dr Masri Ayob - TK2123
13
Execute Cycle
Processor-memory• data transfer between CPU and main memory
Processor I/O• Data transfer between CPU and I/O module
Data processing• Some arithmetic or logical operation on data
Control• Alteration of sequence of operations.• e.g. jump
Combination of above
Prepared by: Dr Masri Ayob - TK2123
14
Design Principles for Modern Computers
All instructions directly executed by hardware.
Instructions should be easy to decode
Only loads, stores should reference memory.
Provide plenty of registers.
Prepared by: Dr Masri Ayob - TK2123
15
Design Principles for Modern Computers
Maximise rate at which instructions are issued:• Two separate fetch-execute cycle:• fetch unit to retrieve and decode instructions;• And execution unit to perform the actual
instruction operation. This allows independent, concurrent operation of the two parts of the fetch-execute cycle.
• Pipelining to allow overlapping between the fetch-execute cycles of sequences of instructions.
• Separate execution units for different types of instructions.
Prepared by: Dr Masri Ayob - TK2123
16
SEPARATE FETCH UNIT/EXECUTE UNIT
To achieve maximum performance, these two parts operate as independently from each other as possible;• but an instruction must be fetched before it can be
decoded and executed.
several instructions are fetched concurrently from memory by the fetch unit, based on the current address stored in an instruction pointer (IP i.e. PC) register.
Prepared by: Dr Masri Ayob - TK2123
17
SEPARATE FETCH UNIT/EXECUTE UNIT
Once an instruction is fetched, it is held in a buffer until it can be decoded and executed.
The number of instructions held will depend upon the size of each instruction, the width of the memory bus, and the size of the buffer.
Prepared by: Dr Masri Ayob - TK2123
18
SEPARATE FETCH UNIT/EXECUTE UNIT
As instructions are executed, the fetch unit takes advantage of time when the bus is not otherwise being used and attempts to keep the buffer filled with instructions.
In general, modern memory buses are wide enough and fast enough that they do not limit instruction retrieval.
Prepared by: Dr Masri Ayob - TK2123
19
SEPARATE FETCH UNIT/EXECUTE UNIT
The execution unit contains the ALU and the portion of the control unit that identifies and controls the steps that comprise the execution part for each different instruction.
When the execution unit is ready for an instruction, the instruction decoder passes the new instruction to the control unit for execution.
Prepared by: Dr Masri Ayob - TK2123
20
Pipelining
Observe that the limitation to performance results from the serial nature of CPU processing: • each instruction requires a sequence of fetch-
execute cycle steps, and • the program requires the execution of a sequence
of these instructions. • Thus, the keys to increased performance must rely
on methods that reduce the time required for each step in the fetch- execute cycle.
Prepared by: Dr Masri Ayob - TK2123
21
Pipelining
Most instructions require many steps (clock cycles) to fetch/execute the instruction.
Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently.
To speed up processing several independent instructions can be overlapped, so that several instructions are being worked on at a time – pipelining.
Prepared by: Dr Masri Ayob - TK2123
22
Instruction-Level Parallelism
A five-stage pipeline
The state of each stage as a function of time. Nine clock cycles are illustrated
Prepared by: Dr Masri Ayob - TK212323
Pipelining
Problem: a branch instruction may invalidate all the instructions in the pipeline at that instant if the branch is taken, and the computer still must have the data from the previous instruction if the next instruction requires it in order to proceed.
Modern computers use a variety of techniques to compensate for the branching problem.• One common approach is to maintain two or more separate
pipelines so that instructions from both possible outcomes can be processed until the direction of the branch is clear.
• Another approach attempts to predict the probable branch path based on the history of previous execution of the same instruction.
The problem of waiting for data results from previous instructions can be alleviated by separating the instructions so that they are not executed one right after the other.
Prepared by: Dr Masri Ayob - TK2123
24
Superpipelined
Many pipeline stages need less than half a clock cycle.
Double internal clock speed gets two tasks per external clock cycle.
Superscalar allows parallel fetch execute.
Prepared by: Dr Masri Ayob - TK2123
25
Scalar and Superscalar Processor Organisation
It is not useful to pipe different types of instructions through a single pipeline.
With a single execution unit pipeline (ignoring problems with different instruction types and branch conditions), • the CPU can average instruction execution
approximately equal to the clock speed of the machine.
• processor fulfilling this condition is called a scalar processor.
Prepared by: Dr Masri Ayob - TK2123
26
Scalar and Superscalar Processor Organisation
With multiple execution units, it is possible to process instructions in parallel, with an average rate of more than one instruction per clock cycle. • The ability to process more than one instruction
per clock cycle is known as superscalar processing.
Pipelining and superscalar processing techniques do not affect the cycle time of any individual instruction.
Prepared by: Dr Masri Ayob - TK2123
27
Superscalar Architectures (1)
Dual five-stage pipelines with a common instruction fetch unit.
Prepared by: Dr Masri Ayob - TK212328
Superscalar Architectures (2)
A superscalar processor with five functional units.
Prepared by: Dr Masri Ayob - TK212329
Limitations
Technical issues that must be resolved to make it possible to execute multiple instructions simultaneously:• True data dependency : Problems that arise from
instructions completing in the wrong order.• Procedural dependency : Changes in program flow
due to branch instructions.• Resource conflicts: Conflicts for internal CPU
resources, particularly general-purpose registers. • Output dependency• Antidependency
Prepared by: Dr Masri Ayob - TK2123
30
True Data Dependency
ADD r1, r2 (r1 := r1+r2;)
MOVE r3,r1 (r3 := r1;)
Can fetch and decode second instruction in parallel with first
Can NOT execute second instruction until first is finished
Prepared by: Dr Masri Ayob - TK2123
31
Procedural Dependency
Conditional branch instructions may depend on the results from instructions that have not yet been executed. • These situations are known as flow or branch
dependencies. • If the wrong branch is in the pipeline, the pipeline
must be flushed and refilled, wasting time. • Worse yet, an instruction from the wrong branch,
that is, one that should not have been executed, can alter a previous result that is still needed.
Prepared by: Dr Masri Ayob - TK2123
32
Resource Conflict
Two or more instructions requiring access to the same resource at the same time• e.g. two arithmetic instructions
Can duplicate resources• e.g. have two arithmetic units
Prepared by: Dr Masri Ayob - TK2123
33
In-Order Issue Out-of-Order Completion
Output dependency• R3:= R3 + R5; (I1)• R4:= R3 + 1; (I2)• R3:= R5 + 1; (I3)• I2 depends on result of I1 - data dependency• If I3 completes before I1, the result from I1 will be
wrong - output (read-write) dependency.
Prepared by: Dr Masri Ayob - TK2123
34
Antidependency
Write-write dependency• R3:=R3 + R5; (I1)• R4:=R3 + 1; (I2)• R3:=R5 + 1; (I3)• R7:=R3 + R4; (I4)• I3 can not complete before I2 starts as I2 needs a
value in R3 and I3 changes R3
Prepared by: Dr Masri Ayob - TK2123
35
Design Issues
Instruction level parallelism• Instructions in a sequence are independent• Execution can be overlapped• Governed by data and procedural dependency
Machine Parallelism• Ability to take advantage of instruction level
parallelism• Governed by number of parallel pipelines
Prepared by: Dr Masri Ayob - TK2123
36
Instruction Issue Policy
Order in which instructions are fetched.
Order in which instructions are executed.
Order in which instructions change registers and memory.
Prepared by: Dr Masri Ayob - TK2123
37
Processor-Level Parallelism (1)
An array of processor of the ILLIAC IV type.
Prepared by: Dr Masri Ayob - TK212338
Processor-Level Parallelism (2)
A single-bus multiprocessor.
A multicomputer with local memories.
Prepared by: Dr Masri Ayob - TK2123
39
Buses
There are a number of possible interconnection systems
Single and multiple BUS structures are most common
e.g. Control/Address/Data bus (PC)
e.g. Unibus (DEC-PDP)
Prepared by: Dr Masri Ayob - TK2123
40
What is a Bus?
A communication pathway connecting two or more devices.
Is a physical connection for transferring data from one location in the computer system to another.
Definition: A group of electrical conductors suitable for carrying computer signals from one location to another. • Each conductor in the bus is commonly known as a line
• Each line carries a single electrical signal - might represent one bit of a memory address, or a sequence of data bits.
Often grouped• A number of channels in one bus
• e.g. 32 bit data bus is 32 separate single bit channels
Power lines may not be shownPrepared by: Dr Masri Ayob - TK2123
41
Data Bus
Carries data• Remember that there is no difference between
“data” and “instruction” at this level.
Width is a key determinant of performance• 8, 16, 32, 64 bit
Prepared by: Dr Masri Ayob - TK2123
42
Address bus
Identify the source or destination of data
e.g. CPU needs to read an instruction (data) from a given location in memory
Bus width determines maximum memory capacity of system• e.g. 8080 has 16 bit address bus giving 64k
address space
Prepared by: Dr Masri Ayob - TK2123
43
Control Bus
Control and timing information• provide control for the proper synchronisation and
operation of the bus and of the modules that are connected to the bus:• Memory read/write signal• Interrupt request• Bus request• Clock signals• Etc.
Prepared by: Dr Masri Ayob - TK2123
44
Bus Interconnection Scheme
Prepared by: Dr Masri Ayob - TK2123
45
Big and Yellow?
What do buses look like?• Parallel lines on circuit boards• Ribbon cables• Strip connectors on mother boards• e.g. PCI
• Sets of wires
Prepared by: Dr Masri Ayob - TK2123
46
Physical Realisation of Bus Architecture
Prepared by: Dr Masri Ayob - TK2123
47
Buses
Buses may connect modules together in various ways.
A bus may carry signals from a specific source to a specific destination - point-to-point bus.
• E.g. The cable that connects the parallel or serial port in a personal computer from the computer to a printer.• Point-to-point buses intended for connection to
a plug-in device are often called ports.
Prepared by: Dr Masri Ayob - TK2123
48
Buses
Multipoint bus (or multidrop or broadcast bus) – is used to connect several points together, where signals produced by a source on the bus are “broadcast” to every other point on the bus.• E.g. Ethernet network .• In most cases, a multipoint bus requires
addressing signals on the bus to identify the desired destination that is being addressed by the source at a particular time.
Prepared by: Dr Masri Ayob - TK2123
49
Buses
Prepared by: Dr Masri Ayob - TK2123
50
Single Bus Problems
Lots of devices on one bus leads to:• Propagation delays• Long data paths mean that co-ordination of bus
use can adversely affect performance• If aggregate data transfer approaches bus
capacity
Most systems use multiple buses to overcome these problems
Prepared by: Dr Masri Ayob - TK2123
51
Buses
Backplane/system bus/ external bus (example of broadcast bus) - When the bus is used to carry computer signals that connect the CPU with memory and/or with a set of plug-in I/O module cards in the same physical package.
Different buses might be used for connecting the different parts of the system.
The interfaces between different buses are called bus interface bridges - make it possible for different buses to communicate with each other.
Prepared by: Dr Masri Ayob - TK2123
52
BusesThe buses connecting various parts of the CPU are actually within the CPU chip.
Bus protocol - is an agreement between two or more entities that establishes a clear, common path of communication and understanding between them.
Prepared by: Dr Masri Ayob - TK2123
53
BusesThe external CPU bus – backplane:• Peripheral control interface (PCI) bus - a popular
modern external bus, which used in Sun workstations, Apple Macintosh computers, Intel PCs, and Hewlett-Packard AlphaServers. • This means that the same peripheral I/O cards may
be plugged into many different computers. • AGP (accelerated graphics processor) bus.• ISA (industry standard architecture) bus - was the
standard system bus for Intel PCs for many years, but is rapidly becoming extinct in favor of the faster and more flexible PCI bus for general I/O interface use.
Prepared by: Dr Masri Ayob - TK2123
54
Typical PC interconnections
Prepared by: Dr Masri Ayob - TK2123
55
Bus Types
Dedicated• Separate data & address lines
Multiplexed• Shared lines• Address valid or data valid control line• Advantage - fewer lines• Disadvantages• More complex control• Ultimate performance
Prepared by: Dr Masri Ayob - TK2123
56
Bus Arbitration
More than one module controlling the bus
e.g. CPU and DMA controller
Only one module may control bus at one time
Arbitration may be centralised or distributed
Prepared by: Dr Masri Ayob - TK2123
57
Centralised or Distributed Arbitration
Centralised• Single hardware device controlling bus access• Bus Controller• Arbiter
• May be part of CPU or separate
Distributed• Each module may claim the bus• Control logic on all modules
Prepared by: Dr Masri Ayob - TK2123
58
Timing
Co-ordination of events on bus
Synchronous• Events determined by clock signals• Control Bus includes clock line• A single 1-0 is a bus cycle• All devices can read clock line• Usually a single cycle for an event
Prepared by: Dr Masri Ayob - TK2123
59
PCI Bus
Peripheral Component Interconnection
Intel released to public domain
32 or 64 bit
50 lines
Prepared by: Dr Masri Ayob - TK2123
60
PCI Bus Lines (required)
Systems lines• Including clock and reset
Address & Data• 32 time mux lines for address/data• Interrupt & validate lines
Interface Control
Arbitration• Not shared• Direct connection to PCI bus arbiter
Error lines
Prepared by: Dr Masri Ayob - TK2123
61
PCI Bus Lines (Optional)
Interrupt lines• Not shared
Cache support
64-bit Bus Extension• Additional 32 lines• Time multiplexed• 2 lines to enable devices to agree to use 64-bit
transfer
JTAG/Boundary Scan• For testing procedures
Prepared by: Dr Masri Ayob - TK2123
62
PCI Commands
Transaction between initiator (master) and target
Master claims bus
Determine type of transaction• e.g. I/O read/write
Address phase
One or more data phases
Prepared by: Dr Masri Ayob - TK2123
63
Prepared by: Dr Masri Ayob - TK2123
64
Thank youQ & A
Top Related