Post on 17-Apr-2018
MICROPROCESSORS
Lecture 2: identify the core components of a CPU
Networks and Communication Department
1
By: Latifa ALrashed
Outline
Networks and Communication Department
¨ Identify the core components of a CPU ¤ EDB ¤ Registers ¤ codebook ¤ Clock
¨ Describe the relationship of CPU and RAM ¨ Pipelining ¨ CPU Cache
Talking to the Man
¨ Imagine 16 lights ¤ 8 on the inside and 8 on the outside ¤ When an inside light is on, the corresponding outside light is on. ¤ We can switch these lights on and off. ¤ This communication system is like the external data bus
Talking to the Man (Cont.)
¨ In reality, a lot of little wires flash on or off ¤ Voltage is applied or not ¤ Represented not as on, on, off, off…
but as 1, 1, 0, 0…
On
1
Off
0
On
1
Off
0
On
1
Off
0
On
1
On
1
Networks and Communication Department
¨ Need some sort of codebook that assigns meanings to the many different patterns
¨ You can see little wires sticking out of the CPU ¨ Figure shows the close-up of the underside of a CPU
Talking to the Man (Cont.)
External Data Bus
¨ The CPU communicates with the outside world using the external data bus (EDB) ¤ Instead of light bulbs, the EDB is made up of tiny wires ¤ The state of a wire is expressed in a binary format,
with zeroes and ones ¤ This “1 and 0” or binary system is used to describe the
state of these wires at any given moment. ¤ Each state represents a line of code in a program
Registers
Networks and Communication Department
¨ Registers are tiny storage areas on the CPU, microscopic semiconductor circuits.
¨ They provide the Man in the Box with a workplace for the problems you give him.
¨ He needs at least four worktables.
Registers (Cont.)
Networks and Communication Department
¨ Each of these four worktables has 16 light bulbs. (16 bits)
¨ All CPUs contain a large number of registers ¨ The four most commonly used ones: the general-purpose
registers. ¨ Intel gave them the names AX, BX, CX, and DX.
¤ AX (Accumulator Register) ¤ BX (Base Register) ¤ CX (Count Register) ¤ DX (Data Register)
¨ The man in the box needs one more tool: the codebook or instruction set ¤ Called microprocessor’s machine language. ¤ One command is a line of code ¤ Here are some examples of real machine language for
the Intel 8088
Instruction Meaning 10111010 The next line of code is a number. Put
that number into the DX register 01000001 Add 1 to the number already in the CX
register 00111100 Compare the value in the AX register with
the next line of code
The codebook
The codebook (Cont.)
Networks and Communication Department
¨ By placing machine language commands called lines of code onto the external data bus one at a time, you can instruct the Man in the Box to do specific tasks.
¨ All of the machine language commands that the CPU understands make up the CPU’s instruction set.
¨ The CPU does no work until told to even though data may be on the EDB
¨ You need a buzzer to tell the man in the box to start ¤ This is referred to as a clock ¤ A clock is actually a stream of
pulses
¨ Of course, a real computer doesn’t use a buzzer. The buzzer on a real CPU is a special wire called the CLOCK wire (most diagrams label the clock wire CLK).
¨ A charge on the CLK wire tells the CPU there’s another piece of information waiting to be processed
zz
10000101
00110101
Time to work
10000101
00110101
Clock
Clock (Cont.)
¨ A clock cycle is the time taken by the special wire to charge up
¨ Actually, the CPU requires at least two clock cycles to act on a command, and usually more.
¨ In fact, a CPU may require hundreds of clock cycles to process some commands
Clock (Cont.)
Networks and Communication Department
¨ The maximum number of clock cycles that a CPU can handle in a given period of time is referred to as its clock speed
¨ The clock speed is the fastest speed at which a CPU can operate, determined by the CPU manufacturer.
¨ The rated speed of the CPU, measured in Hertz – cycles (ticks) per second.
Clock (Cont.)
Networks and Communication Department
¨ i.e. The Intel 8088 processor had a clock speed of
4.77 MHz (4.77 million of cycles per second)
¨ 1 hertz (1 Hz) = 1 cycle per second ¨ 1 kilohertz (1 KHz) = 1 thousand cycles per second ¨ 1 megahertz (1 MHz) = 1 million cycles per second ¨ 1 gigahertz (1 GHz) = 1 billion cycles per second
Diagram of an Intel 8088 showing the external data bus
and clock wires
Networks and Communication Department
In Summary
¨ The CPU is like a man in a box
¨ The external data bus gets data in and out of the CPU
¨ Registers are used as temporary storage inside the CPU
¨ The instruction set is like a codebook
¨ The clock defines the speed of the CPU
10000101
00110101
11001001
10100001
How the CPU executes program code
Networks and Communication Department
¨ Try the following simple exercise to see how the process works.
¨ Tell the 8088 CPU to add 2 + 3.
Exercise
Networks and Communication Department
1. Place 10000000 on the external data bus (EDB). 2. Place 00000010 on the EDB. 3. Place 10010000 on the EDB. 4. Place 00000011 on the EDB. 5. Place 10110000 on the EDB. 6. Place 11000000 on the EDB.
¨ When you finish Step 6, the value on the EDB will be 00000101, the decimal number 5 written in binary.
Connection between the CPU and the RAM
Networks and Communication Department
¨ Program itself is stored on the hard drive. ¨ The hard drive is too slow ¨ Memory Takes copies of programs from the hard drive
and then sends them, one line at a time, to the CPU quickly enough to keep up with its demands.
¨ Also, it must store the result of the programs. ¨ Must be done at or at least near the clock speed of the
CPU. ¨ The CPU needs a way to address each line of this
memory
Memory controller chip
Networks and Communication Department
¨ The Memory controller chip (MCC) contains special circuitry that enables it to grab the contents of any single line of RAM and place that data or command on the external data bus.
¨ This in turn enables the CPU to act on that code
Address Bus
Networks and Communication Department
¨ Address Bus enables CPU to tell the MCC which line of code it needs
¨ Different CPUs have different numbers of wires
¨ The 8088 had 20 wires in its address bus
¨ If you know the number of wires in the CPU’s address bus, you know the maximum amount of RAM that a particular CPU can handle.
Address Bus
q Another set of wires in addition to the external data bus q Used by the CPU to tell the Northbridge which line of code it
wants from RAM
MCC
Address Bus
¨ The number of wires in the address bus determines the maximum amount of RAM the CPU can handle
¤ An 8088 had 20 wires, which provided 220 combinations (1,048,576 or 1 MB)
¤ Many current CPUs use 36 wires, which provide 236
combinations (68,718,476,736 or 64 GB)
How many patterns?
Networks and Communication Department
¨ The 8088 had a 20 wires address bus,
¨ you can say that the 8088 had (Maximum) one megabyte (1
MB) of RAM .. (How !!!)
¨ If you have 20 wires, you would have 220 (or 1,048,576)
combinations. Because each pattern points to one line of code
and each line of RAM is one byte
¨ Therefore, had an address space of 1,048,576 bytes.
¨ The most RAM it could handle was 220 or 1,048,576 bytes.
Which Pattern Goes to Which Row?
Networks and Communication Department
¨ The CPU identifies the first byte of RAM on the address bus as
00000000000000000000
¨ T he CPU i den t i f i e s t h e l a s t RAM row w i t h
11111111111111111111 (the 1,048,576th line of RAM)
¨ Obviously, the address bus also addresses all the other rows of
RAM in between
¨ So, the CPU can access any row of RAM it needs
Traditional Pipeline Concept
¨ Laundry Example ¨ Ann, Brian, Cathy, Dave
each have one load of clothes to wash, dry, and fold
¨ Washer takes 30 minutes
¨ Dryer takes 40 minutes
¨ “Folder” takes 20 minutes
A B C D
Traditional Pipeline Concept
¨ Sequential laundry takes 6 hours for 4 loads
¨ If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Time
Traditional Pipeline Concept
¨ Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T a s k
O r d
e r
Time
30 40 40 40 40 20
Traditional Pipeline Concept ¨ Pipelining doesn’t help latency
of single task, it helps throughput of entire workload
¨ Pipeline rate limited by slowest pipeline stage
¨ Multiple tasks operating simultaneously using different resources
A
B
C
D
6 PM 7 8 9
T a s
k O
r d e
r
Time
30 40 40 40 40 20
Use the Idea of Pipelining in a Computer
Networks and Communication Department
1. Fetch: Get the data from the EDB 2. Decode: Figure out what type of command needs
to be done 3. Execute: Perform the calculation 4. Write: Send the data back onto the EDB ¨ There are discrete circuits inside your CPU handle
each of these stages.
Pipelining
Networks and Communication Department
¨ In early CPUs, when a command was placed on the EDB, each stage did its job and the CPU handed back the answer before starting the next command, requiring at least four clock cycles to process a command.
¨ In every clock cycle, three of the four circuits sat idle.
Pipelining (Cont.)
Networks and Communication Department
¨ Today, the circuits are organized in a fashion called a pipeline.
¨ With pipelining, each stage does its job with each clock cycle pulse
¨ The CPU has multiple circuits doing multiple jobs
Pipelining (Cont.)
Networks and Communication Department
¨ Pipelines keep every stage of the processor busy on every click of the clock, making a CPU run more efficiently without increasing the clock speed.
¨ No CPU ever made has fewer than four stages, ¨ but advancement in caching have increased the
number of stages over the years. ¨ Current CPU pipelines contain many more stages, up
to 20 in some cases.
Pipelining (Cont.)
F 1 E 1 F 2 E 2 F 3 E 3
I 1 I 2 I 3
(a) Sequential execution
Instruction fetch unit
Ex ecution unit
Interstage buffer B1
(b) Hardware organization
T ime
F 1 E 1
F 2 E 2
F 3 E 3
I 1
I 2
I 3
Instruction
(c) Pipelined execution
Basic idea of instruction pipelining.
Clock cycle 1 2 3 4 T ime
Fetch + Execution
Pipeline Performance
¨ The potential increase in performance resulting from pipelining is proportional to the number of pipeline stages.
¨ However, this increase would be achieved only if all pipeline stages require the same time to complete, and there is no interruption throughout program execution.
¨ Unfortunately, this is not true.
Pipeline Performance
¨ The previous pipeline is said to have been stalled for two clock cycles.
¨ Any condition that causes a pipeline to stall is called a hazard.
Pipelining isn’t perfect
Networks and Communication Department
Certain commands are complex and therefore harder to decode than other commands.
The Pentium used two decode stages to reduce the chance of pipeline stalls due to complex decoding
Networks and Communication Department
Sometimes a stage hits a complex command that requires more than one clock cycle, forcing the pipeline to stop. These stops, called pipeline stalls
Pipelining (Cont.)
Networks and Communication Department
¨ The inside of the CPU is composed of multiple chunks of circuitry to handle the different types of calculations your PC needs to do.
¨ For example, one part, the integer unit, handles integer math—basic math for numbers with no decimal point.
¨ The typical CPU spends more than 90 percent of its work doing integer math.
¨ But the Pentium also had special circuitry to handle complex numbers, called the floating point unit (FPU).
¨ With a single pipeline, only the integer unit or the floating point unit worked at any execution stage. (Second issue!!)
Pipelining (Cont.)
Networks and Communication Department
¨ Worse yet, floating point calculation often took many, many clock cycles to execute, forcing the CPU to stall the pipeline until the floating point finished executing the complex command (As shown in figure “Bored ALU”)
Pipelining (Cont.)
Networks and Communication Department
¨ Intel gave the Pentium two pipelines, ¨ one main, “do everything” pipeline and one that
only handled integer math. ¨ Although this didn’t stop pipeline stalls, it at least had
a second pipeline that kept running when the main one stalled
Pipelining (Cont.)
Networks and Communication Department
¨ The two pipelines on the old Pentium were so successful that Intel and AMD added more and more pipelines to subsequent CPUs
¨ One of the biggest differences between equivalent AMD and Intel processors is the pipelines.
¨ AMD tends to go for lots of short pipelines whereas Intel tends to go with just a few long pipelines.
Cache
¨ Cache is separate storage area used for quick access of data
¨ CPU runs faster than RAM
¨ However, you’ll always get pipeline stalls-called wait states-due to the RAM not keeping up with the CPU.
¨ To reduce wait states, the Pentium came with built-in, very high speed
¨ RAM called static RAM (SRAM).
¨ Using a faster RAM cache close to the CPU helps the CPU run without waiting
¨ There are 2 types of cache memory: ¤ L1: inside the CPU ¤ L2: on the motherboard outside the CPU
1011 0101
1001 0001 1011 0110 0001 1101 1101 1101 1100 0111 1001 1111 1100 1010
RAM Cache
RAM
CPU
L1 and L2 Cache
¨ L1 is small (16-32 KB) and runs almost as fast as CPU.
Now is a part of CPU.
¨ L2 cache is larger (64KB to 1MB) and runs slower than
CPU.
¨ L2 Was external to CPU and now a part of CPU also.
¨ For our next instruction, we go from CPU to L1 cache to
L2 cache to RAM
L2 Cache
¨ L2 was originally on the motherboard ¤ Referred to as external cache ¤ Not uncommon on today’s CPUs