8/12/2019 Hardware Fundamentals(1)
1/44
Some Hardware Fundamentals and an Introduction to
Software
In order to comprehend fully the function of system software, it is vital to understand theoperation of computer hardware and peripherals. The reason for this is that software and
hardware are inextricably connected in a symbiotic relationship. First, however, we need
to identify types of software and their relationship to each other and, ultimately to the
hardware.
Figure 1 Software Hierarchy
Figure 1 represents the relationship between the various types of software and hardware.
The figure appears as an inverted pyramid to reflect the relative size and number of the
various types of software, on one hand, and their proximity to computer hardware, on the
other.
First, application software is remote from, and rarely interacts with, the computers
hardware. This is particularly true of applications that run on modern operating systems
such as indows !T ".#, $###, %& and '!I% variants. (y $###, with the advent of
the .!)T and *ava paradigms, applications became even further removed from hardware,
Tom (utler1
Application Software
System Software
Firmware
Hardware
Computer Logic Circuits
8/12/2019 Hardware Fundamentals(1)
2/44
as .!)Ts +ommon anguage -untime +-/ and the *ava 0irtual achine *0/
provide operating system and hardware access.
Indeed, from the operating systems perspective, the *0 and the +- are merely
applications. 2lder operating systems such as 34523, permitted some direct interactionbetween applications, chiefly computer games6 however, this meant that vendors of such
applications had to write code that would interact with the computers (I23 (asic
Input72utput 3ystem or firmware based on the computers read only memory or -2
integrated circuits/. Indeed, when computers first appeared on the mar8et this practice
was the norm, rather than the exception. 9pplication programmers soon tired of
reinventing the wheel every time they wrote an application, as they would have to include
software routines that helped the application software communicate and control hardware
devices, including the +&' central processing unit/. In order to overcome this, computer
scientists focused on developing a new type of software:operating or system software:
whose sole purpose was to provide an environment or interface for applications such that
the burden of managing and communicating the computer hardware was removed from
application programs. This proved important as technological advances resulted in
computer hardware becoming more sophisticated and difficult to manage. Thus,
operating systems were developed to manage a computers hardware resources and
provide an application programming interface, as well as a user or administrator
interface, to permit access to the hardware for use and configuration by application
software programmers and systems administrators.
In early computer systems, a boot strap code, that was either loaded into the system
manually via switches, and7or pre4coded punched cards or teletype tape, was re;uired to
load the operating system program and boot the system so that an application program
could be loaded and run. The advent of read only memory -2/ in the 1owever, firmware came into its own in microprocessors
Tom (utler$
8/12/2019 Hardware Fundamentals(1)
3/44
systems and, later, personal computers. (y the turn of the new millennium, entire
operating systems, such as indows !T and inux, appeared in the firmware of
embedded systems. The most recent advances in this area have been in the &59 or
&oc8et &+ mar8et, where &alm 23 and icrosoft +) are competing for dominance. That
said, while almost every type of electronic device possesses firmware of one form or
other, the most prevalent appears in personal computers &+s/. i8ewise, &+s dominate
the computer mar8et due to their presence in all areas of human activity. >ence,
understanding &+ hardware has become a sine ;ua non for all who call themselves IT
professionals. The remainder of this chapter therefore focuses on delineating the basic
architecture of todays &+.
A Brief Look Under the Hood of Todays PCThis section provides a brief examination of the ma?or components of the &+.
The Power Supply
The most oft ignored of the &+s component is the system power supply. ost household
electrical appliances operate on alternating current 9+/ 11# 0olt @# >z 9+, e.g. '39/
or $$# 0olt A# >z 9+, )urope/. >owever, electronic subassemblies or entire devices
with embedded logic circuitry, whether microprocessor4based or not, operate exclusively
on direct current 5+/. The ?ob of a &+s power supply is to transform and rectify the
external 9+ commercial supplies to a range of 5+ voltages re;uired by the computer
logic, associated electronic components, the 5+ motors in the hard dis8 drives, floppy,
+5B-2, and 505 drives and the system fans. Typical 5+ power supplies in a &+ are
rated at 1.A, C.C, A, 4A, 1$, 41$ volts. 9lso note that as !oteboo8 and aptop computers
have a rechargeable 5+ battery, it re;uires special 5+45+ converters to generate the
re;uired range of 5+ voltages. 3everal colour4designated cables emanate from a
computers power supply unit, the largest of which is connected to the computers maincircuit board, called the motherboard. The various 5+ voltages are distributed via the
power supply rails printed onto the circuit board.
Tom (utlerC
8/12/2019 Hardware Fundamentals(1)
4/44
The Basic Input/Output Operating System
The (asic Input72utput 3ystem (I23/ is system software and is a collection of
hardware4related software routines embedded as firmware on a read4only memory
-2/ integrated circuit I+/ Dchip which is typically housed on a computers
motherboard. 'sually referred to as -2 (I23, this software component provides the
most fundamental means for the operating system to communicate with the hardware.
>owever, most (I23s are 1@ bit programs and must operate in real mode 1on machines
with Intel processors. hile this does not cause performance problems during the boot4
up phase, it means a degradation in &+ performance as the +&' switches from protected
to real mode when (I23 routines are referenced by an operating system. C$ bit (I23s
are presently in use, but are not widespread. odern C$ bit operating systems such as
I!'% do not use the (I23 after bootup, as the designers of I!'% integrated C$ bit
versions of the (I23 routines into the I!'% 8ernel. >ence, the limitations of real mode
switches in the +&' are avoided. !evertheless, the (I23 plays a critical role during the
boot4up phase, as it performs the power4on self test &23T/ for the computer and then
loads the boot code from the hard dis8s master boot record (-/, which in turn copies
the system software into -9 and loads it into the +&'.
hen a computer is first turned on, 5+ voltages are applied to the +&' and associated
electrical and logic circuits. This would lead to electronic mayhem if the +&' did not
assert control. >owever, a +&' is merely a collection of hundreds of thousands and now
millions/ of logic circuits. +&' designers therefore built in a predetermined se;uence of
programmed electronic events, which are triggered when a signal appears on the +&'s
reset pin. This has the +&'s control unit use the memory address in the instruction
counter I+/ register to fetch the first instruction to be executed. The C$ bit value placed
in the I+ is the address of the first byte of the final @" E( segment in the first 1 ( of
the computers address space this is a hangover from the early days of the &+ when the
last C" E( of the first 1 ( of -9 was reserved for the system and peripheral (I23
routines, each of which were @" E( in length/. This is the address of first of the many 1@
11@ bit applications operate in real mode on all Intel +&'s. This effectively limits the address space to 1
(, by using 1@ x @" E( program segments. )ach 1@ bit application can only address @" E( $16G @A,AC@
locations/, however, the +&' manages and uses an extra " bit7address lines to provide the (I23, 23 and
applications with 1@ $4G 1@/ segment addresses.
Tom (utler"
8/12/2019 Hardware Fundamentals(1)
5/44
8/12/2019 Hardware Fundamentals(1)
6/44
8/12/2019 Hardware Fundamentals(1)
7/44
Figure The Intel !"# $hipset
The %other&oard and the $hipset
The motherboard or system board houses all system components, from the +&', -9,
expansion slots ).J. I39 and &+I/, to the I72 controllers. >owever, the 8ey component
on a motherboard is the chipset. hile motherboards are identified physically by their
form factor, the chipset designation indicates the capability of the motherboard to house
system components. The most popular form factor is I(s 9T%. This motherboard was
designed by I( to increase air movement for cooling on4board components, and allow
easier access to the +&' and -9. hile the motherboard contains many chips or I+s,
such as the +&', -9, (I23, and a variety of smaller chips, two chips now handle most
of the I72 functionality of a &+. The first is the !orthbridge chip, which handles all
communication address, data and control/ to the +&', -9, 9ccelerated Jraphics &ort
Tom (utler=
8/12/2019 Hardware Fundamentals(1)
8/44
and &+I devices. The frontside system bus F3(/ terminates on the !orthbridge chip and
permits the +&' to access the -9, 9J& and &+I devices and those serviced by the
3outhbridge chip and vice versa/. The 3outhbridge chip permits communication with
slow peripherals such as the floppy dis8 drive, the hard dis8 drive7+54-23, I39
devices, and the parallel, serial, mouse, 8eyboard ports Flash -2 (I23.
Figure ' The Intel !#( $hipset
Intel and 0I9 are the leaders in chipset manufacture as of $##$, although there areseveral other manufacturers:9li and 3i3. hile Intel services its own +&'s, 0I9
manufactures for both Intel and its ma?or competitor 95. In $##$, the basic Intel iA#
chipset consisted of the $A# !orthbridge +> emory +ontroller >ub/ and a I+>$
I72 +ontroller >ub/ 3outhbridge. The chipset also contains a Firmware >ub F>/ that
provides access to the Flash -2 (I23. This permits up to "J( of -9 with )++
Tom (utler
8/12/2019 Hardware Fundamentals(1)
9/44
error correction/, "%9J& ode, " 'ltra 9T9 1## I5) dis8 drives, and four '3( ports.
I39 is not supported. 5ifferent chipset designs support different -9 types and speeds
e.g. 55- 35-9 or -9(us 5-9/, +&' types and pac8aging, system bus speeds,
and so on.
In $###, Intel announced that the future of -9 in the &+ industry was -9(us 5ram
-5-9/. This heralded the release of the Intel $# D+amino chipset, which supported
three -9(us memory slots. >owever, errors in the design meant that only two memory
slots could be used. 9 loss of confidence in the mar8etplace meant that withdrawal of the
ill4fated +amino and its replacement with the Intel "# D+armel chipset. This includes a
@" bit &+I controller, a redesigned and improved -5-9 memory repeater, and an
35-9 memory repeater that converts the -5-9 protocol to 35-9. This was a
smart move by Intel, which bac8fired terribly as the 35-9 hub had design errors that
limited the limited the number of 35-9s that could be used. In addition, the -5-9
to 35-9 conversion protocol impaired overall memory throughput when using
35-9. +onse;uently, faster memory performance on Intels &entium III +oppermine
+&'s with an 1CC hz Frontside (us could only be achieved using 0I9s 9pollo &ro
1CC 9. To ma8e matters worse, the Intel 1A 3olano chipset, which was introduced to
support &+ 1CC 5Is 35-9 memory modules/ and to help regain mar8et share
from 0I9, would not allow 35-9 modules wor8 at 1CC hz, if +&'s such as certain
variants of Intels &entium III/ rated for a 1## hz external cloc8 rate were fitted on the
motherboard. This particularly applies to the +eleron family which ran at a @@ hz
external cloc8 rate. It is significant that many of Intels competitors promoted &+1CC and
&+ $@@ 5I standards over the more expensive -9(us 5-9. This further
impeded the acceptance of -5-96 however, by late $##$, -5-9 had its own
mar8et niche as the price of 35-9 increased once more.
Intel learned from its experience with +amino and +armel chipsets. (owing to mar8et
pressure it designed two new chipset families for use with its new &entium I0 +&'. The
first of these, the i"A see Figure $/ was targeted at systems based on the &entium I0
and synchronous 5-9 memory such as the &+1CC, $CC, and CCC, with up to C J( of
memory. The iA# see Figure C/ was targeted on -5-94based systems of up to " J(,
which supported the &+ ##, 1#CC and 1#@@ -9(us memory. In late $##$, The Intel
Tom (utler
8/12/2019 Hardware Fundamentals(1)
10/44
"AJ) chipset was released to support &+CCC 55- 3-9 and &entium " processor.
The chipset also included Intels )xtreme Jraphics technology which ran at $@@ >z
core speed. The basic member of the Intel A# chipset family had support for &+##
-5-9 memory and provided a balanced performance platform for the &entium "
processor with "##>z system bus and !et(urstK 9rchitecture. It also supports dual
channel access to -5-9 -Is, which increases overall throughput to C.$ Jbps.
3ubse;uent developments in this chipset family provided support for -5-9 running at
1#CC hz, 1#@@ hz and a ACC >z F3(. Further advances in 55- 35-9 technologies
saw 55- 35-94based Intel and 0I9 chipsets which accommodated &+$"## 9!5
&+$=## 55- 3--9 running at 1A# hz and 1@@>z.respectively and which is
double cloc8ed to C## and CCC hz so called 55- C## and CCC/. >owever, the
evolution of 55-C@@ and chipset design led to the &+C### 55- 35-9 being released
with even higher bandwidth speeds.
Basic CPU Architectures
CISC vs. RISC
There are two types of fundamental +&' architectureH complex instruction set computers
+I3+/ and reduced instruction set computers -I3+/. +I3+ is the most prevalent and
established microprocessor architecture, while -I3+ is a relative newcomer. Intels
#x@ and &entium microprocessor families are +I3+4based, although -I3+4type
functionality has been incorporated into &entium +&'s. otorolas @### family of
microprocessors is another example of this type of architecture. 3un icrosystems
3&9-+ microprocessors and I&3 -$###, -C### and -"### families dominate the
-I3+ end of the mar8et6 however, otorolas &ower&+, J", Intels i@#, and 9nalog
5evices Inc.s digital signal processors 53&/ are in wide use. In the &+7or8station
mar8et, 9pple +omputers and 3un employ -I3+ microprocessors as their choice of +&'.
Tom (utler1#
8/12/2019 Hardware Fundamentals(1)
11/44
Ta&le 1 $IS$ and )IS$
$IS$ )IS$
arge instruction set +ompact instruction set
+omplex, powerful instructions 3imple hard4wired machine code and control unit
Instruction sub4commands microcoded in on board
-2
&ipelining of instructions
+ompact and versatile register set !umerous registers
!umerous memory addressing options for operands +ompiler and I+ developed simultanwously
The difference between the two architectures is the relative complexity of the instruction
sets and underlying electronic and logic circuits in +I3+ microprocessors. For example,
the original -I3+ I prototype had ?ust C1 instructions, while the -I3+ II had C
8/12/2019 Hardware Fundamentals(1)
12/44
Figure " Typical %icroprocessor Architectures
Tom (utler1$
(us Interface 'nit
&rogram +ounter
3tac8 &ointer
9% (&
(% 3I
+% 5I
5% Flag
Instruction -e ister
Jeneral purpose
registersH 9% is the
9ccumulator
Internal (us
5ecode 'nit
9rithmetic and ogic 'nit
+ontrol 'nit
9ddress (us 5ata (us +ontrol (us
Includes
read7write,
interrupt, cloc8 andreset
8/12/2019 Hardware Fundamentals(1)
13/44
the 1
8/12/2019 Hardware Fundamentals(1)
14/44
several integrated transistors which are configured as a flip4flop circuits each of which
can be switched into a 1 or # state. They remain in that state until changed under control
of the +&' or until the power is removed from the processor. )ach register has a specific
name and is addressable, some, however, are dedicated to specific tas8s while the
ma?ority are Dgeneral purpose. The width of a register depends on the type of +&', e.g.,
an 1@, C$ or @" bit microprocessor. In order to provide bac8ward compatibility, registers
may be sub4divided. For example, the &entium processor is a C$ bit +&', and its registers
are C$ bits wide. 3ome of these are sub4divided and named as and 1@ bit registers in
order to run and 1@ bit applications designed for earlier x@ microprocessors.
Instruction )egister
hen the (us Interface 'nit receives an instruction it transfers it to the Instruction
-egister for temporary storage. In &entium processors the (us Interface 'nit transfers
instructions to the 1 I4+ache, there is no instruction register as such.
Stac* Pointer
9 Dstac8 is a small area of reserved memory used to store the data in the +&'s registers
whenH 1/ system calls are made by a process to operating system routines6 $/ when
hardware interrupts generated by input7output I72/ transactions on peripheral devices6
C/ when a process initiates an I72 transfer6 C/ when a process rescheduling event occurson foot of a hardware timer interrupt. This transfer of register contents is called a Dcontext
switch. The stac8 pointer is the register which holds the address of the most recent
Dstac8 entry. >ence, when a system call is made by a process to say print a document/
and its context is stored on the stac8, the called system routine uses the stac8 pointer to
reload the register contents when it is finished printing. Thus the process can continue
where it left off.
Instruction +ecoder
The Instruction 5ecoder is an arrangement of logic elements which act on the bits that
constitute the instruction. 3imple instructions with corresponding logic hard4wired into
the execution unit are simply passed to the )xecution 'nit and7or the % in the
&entium II, III and I0/, complex instructions are decoded so that related microcode
Tom (utler1"
8/12/2019 Hardware Fundamentals(1)
15/44
modules can be transferred from the +&'s microcode -2 to the execution unit. The
Instruction 5ecoder will also store referenced operands in appropriate registers so data at
the memory locations referenced can be fetched.
Program or Instruction $ounter
The &rogram +ounter &+/ is the register that stores the address in primary memory
-9 or -2/ of the next instruction to be executed. In C$ bit systems, this is a C$ bit
linear or virtual memory address that references a byte the first of " re;uired to store the
C$ bit instruction/ in the processs virtual memory address space. This value is translated
to determine the real memory address in which the instruction is stored. hen the
referenced instruction is fetched, the address in the &+ is incremented to the address of
the next instruction to be executed. If the current address is ##(# hex, then the next
address will be ##(" hex. -emember each byte in -9 is individually addressable,
however each complete instruction is C$ bits or " bytes, and the address of the next
instruction in the process will be " bytes on.
Accumulator
The accumulator may contain data to be used in a mathematical or logical operation, or it
may contain the result of an operation. Jeneral purpose registers are used to support the
accumulator by holding data to be loaded to7from the accumulator.
$omputer Status ,ord or Flag )egister
The result of a 9' operation may have conse;uences of subse;uent operations6 for
example, changing the path of execution. Individual bits in this register are set or reset in
accordance with the result of mathematical or logical operations. 9lso called a flag, each
bit in the register has a preassigned meaning and the contents are monitored by the
control unit to help control +&' related actions.
Arithmetic and -ogic .nit
The 9rithmetic and ogic 'nit 9'/ performs all arithmetic and logic operations in a
microprocessor viz. addition, subtraction, logical 9!5, 2-, )%42-, etc.. 9 typical 9'
is connected to accumulator and general purpose registers and other +&' components
Tom (utler1A
8/12/2019 Hardware Fundamentals(1)
16/44
that help transfer the result of its operations to -9 via the (us Interface 'nit and the
system bus. The results may also be written into internal or external caches.
$ontrol .nit
The control unit coordinates and manages +&' activities, in particular the execution of
instructions by the arithmetic and logic unit 9'/. In &entium processors its role is
complex, as microcode from decoded instructions are pipelined for execution by two
9's.
The System $loc*
The Intel # had a cloc8 speed of ".== hz6 that is, its internal logic gates were opened
and closed under the control of a s;uare wave pulsed signal that had a fre;uency of ".==
million cycles per second. 9lternatively put, the logic gates opened and closed ".==
million times per second. Thus, instructions and data were pumped through the integrated
transistor logic circuits at a rate of ".== million bits per second. ater designs ran at
higher speeds viz. the i$@ 4$# hz, the iC@ 1@4CC hz, i"@ $A4A# hz. here does
this cloc8 signal come fromM )ach motherboard is fitted with a ;uartz oscillator in a
metal pac8age that generates a s;uare wave cloc8 pulse of a certain fre;uency. In i#
systems the crystal oscillator ran at 1".C1 hz and this was fed to the i$" to generate
the system cloc8 fre;uency of ".== hz in earlier system, to 1#hz is later designs.ater, the i$@ &+s had a 1$ hz crystal which provided i$$" I+ multiplier7divider
with the primary cloc8 signal. This then divided7multiplied the basic 1$ hz to generate
the system cloc8 signal of 4$# hz. ith the advent of the i"@5%, the system cloc8
signal, which ran at $A or CC hz, was effectively multiplied by factors of $ and C to
deliver an internal +&' cloc8 speed of A#, @@, =A, 1## hz. This approach is used in
&entium I0 architectures, where the primary crystal source delivers a relatively slow A#
hz cloc8 signal that is then multiplied to the system cloc8 speed of 1##41CC hz. The
internal multiplier in the &entium then multiplies this by a fact or $#N to obtain speeds of
$Jhz and above.
Tom (utler1@
8/12/2019 Hardware Fundamentals(1)
17/44
Instruction $ycle
9n instruction cycle consists of the activities re;uired to fetch and execute an instruction.
The length of time ta8e to fetch and execute is measured in cloc8 cycles. In +I3+
processors this will ta8e many cloc8 cycles, depending on the complexity of theinstruction and number of memory references made to load operands. In -I3+ computers
the number of cloc8 cycles are reduced significantly. hen the +&' finishes the
execution of an instruction it transfers the content of the program or instruction register
into the (us Interface 'nit 1 cloc8 cycle/ . This is then gated onto the system address
bus and the read signal is asserted on the control bus 1 cloc8 cycle/. This is a signal to
the -9 controller that the value of this address is to be read from memory and loaded
onto the data bus "N cloc8 cycles/. The instruction is read in from the data bus and
decoded $ N cloc8 cycles. The fetch and decode activities constitute the first machine
cycle of the instruction cycle. The second machine cycle begins when the instructions
operand is read from -9 and ends when the instruction is executed and the result
written bac8 to memory. This will ta8e at least another N cloc8 cycles, depending on the
complexity of the instruction. Thus an instruction cycle will ta8e at least 1@ cloc8 cycles,
a considerable length of time. Together, -I3+ processors and fast -9 can 8eep this to
a minimum. >owever, Intel made advances by super pipelining instructions, that is by
interleaving fetch, decode, operand read, execute, and retire i.e. write the result of the
instruction to -9/ activities into two separate pipelines serving two 9's. >ence,
instructions are not executed se;uentially, but concurrently and in parallel:more about
pipelining later.
#thand th 0eneration Intel $P. Architecture
The &entium microprocessor was the last of Intels A th generation microprocessors and
had several basic unitsH the (us Interface 'nit (I'/6 the I4+ache E( of write4through
3tatic -9:3-9/6 the Instruction Translation oo8aside (uffer T(/6 The 54
+ache E( of write4bac8 3-9/6 the 5ata T(6 the +loc8 5river7ultiplier6
Instruction Fetch 'nit6 the (ranch &rediction 'nit6 the Instruction 5ecode 'nit6 +omplex
Instruction 3upport 'nit6 3uperscalar Integer )xecution 'nit6 &ipelined Floating &oint
'nit. Figure A presents a bloc8 diagram of the original &entium.
Tom (utler1=
8/12/2019 Hardware Fundamentals(1)
18/44
The &entium was the first Intel chip to have a @" bit external data bus which was split
internally into two separate pipelines, each C$ bits wide. This allowed the &entium to
execute two instructions simultaneously6 however, more than one instruction could be in
the pipeline, thus increasing instruction throughput.
>eat dissipation is enemy of chip designers, as the greater the number of integrated
transistors, the higher the speed of operation and the operating voltage, the more poser is
consumed, and the more heat generated. The first two &entium versions ran at @# and @@
hz respectively with an operating voltage of A 0 5+. >ence they ran ;uite hot.
>owever, a change in pac8age design from 3oc8et A to =, &in Jrid 9rray:&J9/ and a
reduction in operating voltage to C.C 0olts lowered power consumption and heat
dissipation. Intel also introduced a cloc8 multiplier which multiplied the external cloc8
signals and enabled the &entium to run at 1.A, $, $.A and finally C times this speed. Thus
while the system bus ran at A#, @#, and @@ hz, the +&' ran at =A4$##hz.
In 1owever, ma?or design changes came with the
&entium I0. odifications and design changes centered on a/ the physical pac8age6 b/
the process by which instructions were decoded and executed6 c/ support for memory
beyond the " J( limit6 c/ the integration and enhancement of 1 and $ cache
performance and size6 d/ the addition of a new cache6 e/ the speed of internal and
external operation. )ach of these issues receives attention in the following subsections.
Tom (utler1
8/12/2019 Hardware Fundamentals(1)
19/44
Figure # Pentium $P. Bloc* +iagram
Tom (utler
us nter ace n t
t A ress us
- ac e
rac arget u er
Control Unit
re etc u er
Fetch and Decode Unit
ontro ust ata us
- ac e
Microcode
ROM
oc
Multiplier
Dual Pipeline
Execution Unit
- pe ne - pe ne
Registers
Floating Point
Unit
Advanced
Programmable
Interrupt
Controller
1
8/12/2019 Hardware Fundamentals(1)
20/44
Physical Packaging
Two terms are employed to describe the pac8aging employed for the &entium family of
processorsH the first refers to the motherboard connection, and the second to the actual
pac8age itself. For example, the original &entium &A was fitted to the 3oc8et A type
connection on the motherboard using a 3taggered &in Jrid 9rray 3&J9/ for the dies
I72 die is the technical term for the physical structure that incorporates the chip/. ater
variants used the 3oc8et = connector. The &in Jrid 9rray &J9/ family of pac8ages are
associated with different 3oc8et types, which are numbered. 9 pin grid array is simply an
array of metal pin connectors used to form an electrical connection between the internal
electronics of the +&' pac8aged on the die/ and other system components li8e the
system chipsets. The pins plug into corresponding receptacle pinholes in the +&'s
soc8et on the motherboard. The different types of &J9 reflect the type of pac8aging, e.g.
ceramic to plastic, the number of pins, and how they are arrayed. The &entium &ro used a
3&J9 with a staggering C= pins for connection to the motherboard soc8et, called 3oc8et
. The &entium &ro was the first Intel processor to have an $ cache connected to the
+&' via bac8side bus, but on a separate die. This was a significant technical achievement
pac8aging. hen Intel designed the &entium II they decided to change the pac8aging
significantly and introduced a 3ingle )dge +ontact +onnector 3)++/ pac8age with
three variants 3)++ for the &entium II, 3)++$ for the &entium II and 3)&& for the+eleron/, each of which plugged into the 3lot 1 connector on the motherboard. >owever,
later variants of the +eleron and &entium III used &J9 pac8aging for certain
applicationsH the +eleron uses the &lastic &J9, the +eleron III and &entium III the Flip4
+hip &in Jrid 9rray F+4&J9/. (oth use the C=#4pin 3oc8et. The &entium I0 saw a full
return to the &J9 for all chips. >ere a Flip4+hip &in Jrid 9rray F+4&J9/ was
employed in a "= &+&J9 pac8age.
Overall Architectural Comparison of the Pentium Family of Microprocessors
The &entium &A"/ first shipped in 1
8/12/2019 Hardware Fundamentals(1)
21/44
8/12/2019 Hardware Fundamentals(1)
22/44
variants, and the &entium III. 9s indicated, the physical pac8age was also significant
advance, as was the incorporation of additional -I3+ features. >owever, aimed as it was
at the server mar8et, the &entium &ro did not incorporate % technology. It was
expensive to produce as it included the $ cache on its substrate but on a separate die/
and had A.A million transistors at its core and over million in its $ cache. Its core logic
operated at C.C0olts. The microprocessor was still, however, chiefly +I3+ in design, and
optimized for C$ bit operation. The chief features of the &entium &ro wereH
9 partly integrated $ cache of up to A1$ E( on a specially manufactured
3-9 separate die/ that was connected via a dedicated Dbac8side bus that ran at
full +&' speed.
Three 1$ staged pipelines
3peculative execution of instructions
2ut4of4order completion of instructions
"# renamed registers
5ynamic branch prediction
ultiprocessing with up to " &entium &ros
9n increased bus size to C@ bits from C$/ to enable up to @" Jb of memory to be
used. &lease note that the " extra bits can address up to 1@ memory locations6 this
gives " Jb x 1@ G @" Jb of memory./
The following description is ta8en from Intels introduction to its microprocessor
architecture is relevant to all members of the &@ family, including the +eleron, &entium II
and III.
The Intel &entium &ro processor has three4way superscalar architecture. The term
Othree4way superscalarP means that using parallel processing techni;ues, the processor is
able on average to decode, dispatch, and complete execution of retire/ three instructions
per cloc8 cycle. To handle this level of instruction throughput, the &entium &ro processor
uses a decoupled, 1$4stage superpipeline that supports out4of4order instruction execution.
It does this by incorporating even more parallelism than the &entium processor. The
Tom (utler$$
8/12/2019 Hardware Fundamentals(1)
23/44
&entium &ro processor provides 5ynamic )xecution micro4data flow analysis, out4of4
order execution, superior branch prediction, and speculative execution/ in a superscalar
implementation.
The centerpiece of the &entium &ro processor architecture is an innovative out4of4order execution mechanism called Odynamic execution.P 5ynamic execution incorporates
three data4processing conceptsH
Q 5eep branch prediction.
Q 5ynamic data flow analysis.
Q 3peculative execution.
(ranch prediction is a concept found in most mainframe and high4speed -I3+
microprocessor architectures. It allows the processor to decode instructions beyond
branches to 8eep the instruction pipeline full. In the &entium &ro processor, the
instruction fetch7decode unit uses a highly optimized branch prediction algorithm to
predict the direction of the instruction stream through multiple levels of branches,
procedure calls, and returns.
Figure Functional Bloc* +iagram of the PentiumPro Processor %icroarchitecture
Tom (utler$C
8/12/2019 Hardware Fundamentals(1)
24/44
5ynamic data flow analysis involves real4time analysis of the flow of data through the
processor to determine data and register dependencies and to detect opportunities for out4
of4order instruction execution. The &entium &ro processor dispatch7execute unit can
simultaneously monitor many instructions and execute these instructions in the order that
optimizes the use of the processors multiple execution units, while maintaining the
integrity of the data being operated on. This out4of4order execution 8eeps the execution
units busy even when cache misses and data dependencies among instructions occur.
3peculative execution refers to the processors ability to execute instructions ahead of the
program counter but ultimately to commit the results in the order of the original
instruction stream. To ma8e speculative execution possible, the &entium &ro processor
microarchitecture decouples the dispatching and executing of instructions from the
commitment of results. The processors dispatch7execute unit uses data4flow analysis to
execute all available instructions in the instruction pool and store the results in temporary
registers. The retirement unit then linearly searches the instruction pool for completed
instructions that no longer have data dependencies with other instructions or unresolved
branch predictions. hen completed instructions are found, the retirement unit commits
the results of these instructions to memory and7or the Intel 9rchitecture registers the
processors eight general4purpose registers and eight floating4point unit data registers/ in
the order they were originally issued and retires the instructions from the instruction pool.
Through deep branch prediction, dynamic data4flow analysis, and speculative execution,
dynamic execution removes the constraint of linear instruction se;uencing between the
traditional fetch and execute phases of instruction execution. It allows instructions to be
decoded deep into multi4level branches to 8eep the instruction pipeline full. It promotes
out4of4order instruction execution to 8eep the processors six instruction execution units
Tom (utler$"
8/12/2019 Hardware Fundamentals(1)
25/44
running at full capacity. 9nd finally it commits the results of executed instructions in
original program order to maintain data integrity and program coherency.
Three instruction decode units wor8 in parallel to decode ob?ect code into smaller
operations called Omicro4opsP microcode/. These go into an instruction pool, and wheninterdependencies dont prevent/ can be executed out of order by the five parallel
execution units two integer, two F&' and one memory interface unit/. The -etirement
'nit retires completed micro4ops in their original program order, ta8ing account of any
branches.
The power of the &entium &ro processor is further enhanced by its cachesH it has the same
two on4chip 4E(yte 1 caches as does the &entium processor, and also has a $A@4A1$
E(yte $ cache thats in the same pac8age as, and closely coupled to, the +&', using a
dedicated @"4bit Obac8sideP/ full cloc8 speed bus. The 1 cache is dual ported, the $
cache supports up to " concurrent accesses, and the @"4bit external data bus is transaction
4oriented, meaning that each access is handled as a separate re;uest and response, with
numerous re;uests allowed while awaiting a response. These parallel features for data
access wor8 with the parallel execution capabilities to provide a Onon4bloc8ingP
architecture in which the processor is more fully utilized and performance is enhanced.
Pentium Pro Modes of Operation
The Intel 9rchitecture supports three operating modesH protected mode, real4address
mode, and system management mode. The operating mode determines which instructions
and architectural features are accessibleH
Protected mode2 The native state of the processor. In this mode all instructions
and architectural features are available, providing the highest performance and
capability. This is the recommended mode for all new applications and operating
systems. 9mong the capabilities of protected mode is the ability to directly
execute Oreal4addressmodeP #@ software in a protected, multi4tas8ing
environment. This feature is called 3irtual!(! mode, although it is not actually
a processor mode. 0irtual4#@ mode is actually a protected mode attribute that
can be enabled for any tas8.
Tom (utler$A
8/12/2019 Hardware Fundamentals(1)
26/44
)ealaddress mode2 &rovides the programming environment of the Intel #@
processor with a few extensions such as the ability to switch to protected or
system management mode/. The processor is placed in real4address mode
following power4up or a reset.
System management mode2 9 standard architectural feature uni;ue to all Intel
processors, beginning with the IntelC@ 3 processor. This mode provides an
operating system or executive with a transparent mechanism for implementing
platform4specific functions such as power management and system security. The
processor enters 3 when the external 3 interrupt pin 3IR/ is activated
or an 3I is received from the advanced programmable interrupt controller
9&I+/. In 3, the processor switches to a separate address space while saving
the entire context of the currently running program or tas8. 34specific code
may then be executed transparently. 'pon returning from 3, the processor is
placed bac8 into its state prior to the system management interrupt.
The basic execution environment is the same for each of these operating modes,
Basic Pentium 45ecution 4n3ironment
9ny program or tas8 running on an Intel 9rchitecture processor is given a set of
resources for executing instructions and for storing code, data, and state information.
These resources shown in Figure / include an address space of up to $C$ bytes, a set of
general data registers, a set of segment registers, and a set of status and control registers.
hen a program calls a procedure, a procedure stac8 is added to the execution
environment. &rocedure calls and the procedure stac8 implementation are described in
+hapter ",Procedure Calls, Interrupts, and Exceptions./
Figure 6 Basic 45ecution 4n3ironment
Tom (utler$@
8/12/2019 Hardware Fundamentals(1)
27/44
Pentium Pro %emory Organi7ation
The memory that the processor addresses on its bus is called physical memory. &hysical
memory is organized as a se;uence of 4bit bytes. )ach byte is assigned a uni;ue address,
called a physical address. The physical address space ranges from zero to a maximum
of $C$S 1 " gigabytes/. 0irtually any operating system or executive designed to wor8 with
an Intel 9rchitecture processor will use the processors memory management facilities to
access memory. These facilities provide features such as segmentation and paging, which
allow memory to be managed efficiently and reliably. emory management is described
in detail later. The following paragraphs describe the basic methods of addressing
memory when memory management is used. hen employing the processors memory
management facilities, programs do not directly address physical memory. Instead, they
access memory using any of three memory modelsH flat, segmented, or real4address
mode.
ith the flat memory model see Figure C4$/, memory appears to a program as a single,
continuous address space, called a linear address space. +ode a programs
instructions/, data, and the procedure stac8 are all contained in this address space. The
linear address space is byte addressable, with addresses running contiguously from # to
$C$ 4 1. 9n address for any byte in the linear address space is called a linear address.
ith the segmented memory model, memory appears to a program as a group of
independent address spaces called segments. hen using this model, code, data, and
stac8s are typically contained in separate segments. To address a byte in a segment, a
program must issue a logical address, which consists of a segment selector and an offset.
Tom (utler$=
8/12/2019 Hardware Fundamentals(1)
28/44
9 logical address is often referred to as a far pointer./ The segment selector identifies
the segment to be accessed and the offset identifies a byte in the address space of the
segment. The programs running on an Intel 9rchitecture processor can address up to
1@,CC segments of different sizes and types, and each segment can be as large as $ C$
"J(/ bytes.
Internally, all the segments that are defined for a system are mapped into the processors
linear address space. 3o, the processor translates each logical address into a linear address
to access a memory location. This translation is transparent to the application program.
The primary reason for using segmented memory is to increase the reliability of programs
and systems. For example, placing a programs stac8 in a separate segment prevents the
stac8 from growing into the code or data space and overwriting instructions or data,
respectively. 9nd placing the operating systems or executives code, data, and stac8 in
separate segments protects Them from the application program and vice versa.
ith either the flat or segmented model, the Intel 9rchitecture provides facilities for
dividing the linear address space into pages and mapping the pages into virtual memory.
If an operating system7executive uses the Intel 9rchitectures paging mechanism, the
existence of the pages is transparent to an application program.
The realaddress mode model uses the memory model for the Intel #@ processor, the
first Intel 9rchitecture processor. It was provided in all the subse;uent Intel 9rchitecture
processors for compatibility with existing programs written to run on the Intel #@
processor. The real address mode uses a specific implementation of segmented memory
in which the linear address space for the program and the operating system7executive
consists of an array of segments of up to @"E bytes in size each. The maximum size of
the linear address space in real4address mode is $$# bytes.
Figure ! Three %emory %anagement %odels
Tom (utler$
8/12/2019 Hardware Fundamentals(1)
29/44
'&it 3s2 1&it Address and Operand Si7es
The processor can be configured for C$4bit or 1@4bit address and operand sizes. ith C$4
bit address and operand sizes, the maximum linear address or segment offset is
FFFFFFFF> $C$/, and operand sizes are typically bits or C$ bits. ith 1@4bit address
and operand sizes, the maximum linear address or segment offset is FFFF> $ 1@/, and
operand sizes are typically bits or 1@ bits. hen using C$4bit addressing, a logical
address or far pointer/ consists of a 1@4bit segment selector and a C$4bit offset6 when
using 1@4bit addressing, it consists of a 1@4bit segment selector and a 1@4bit offset.
Instruction prefixes allow temporary overrides of the default address and7or operand sizes
from within a program. hen operating in protected mode, the segment descriptor for the
currently executing code segment defines the default address and operand size. 9
segment descriptor is a system data structure not normally visible to application code.
9ssembler directives allow the default addressing and operand size to be chosen for a
program. The assembler and other tools then set up the segment descriptor for the code
segment appropriately. hen operating in real4address mode, the default addressing and
operand size is 1@ bits. 9n address4size override can be used in real4address mode to
Tom (utler$
8/12/2019 Hardware Fundamentals(1)
30/44
enable C$ bit addressing6 however, the maximum allowable C$4bit address is still
####FFFF> $1@/.
Figure 8 Application Programming )egisters
)40IST4)S
The processor provides 1@ registers for use in general system and application programming. 9s shown in
Figure, these registers can be grouped as followsH
0eneralpurpose data registers. These eight registers are available for storing
operands and pointers.
Segment registers. These registers hold up to six segment selectors.
Status and control registers. These registers report and allow modification of thestate of the processor and of the program being executed.
General-Purpose Data Reisters
The C$4bit general4purpose data registers )9%, )(%, )+%, )5%, )3I, )5I, )(&, and
)3& are provided for holding the following itemsH
Tom (utlerC#
8/12/2019 Hardware Fundamentals(1)
31/44
2perands for logical and arithmetic operations
2perands for address calculations
9lthough all of these registers are available for general storage of operands, results, and
pointers, caution should be used when referencing the )3& register. The )3& register
holds the stac8 pointer and as a general rule should not be used for any other purpose.
any instructions assign specific registers to hold operands. For example, string
instructions use the contents of the )+%, )3I, and )5I registers as operands. hen using
a segmented memory model, some instructions assume that pointers in certain registers
are relative to specific segments. For instance, some instructions assume that a pointer in
the )(% register points to a memory location in the 53 segment.
The following is a summary of these special usesH
)9%:9ccumulator for operands and results data.
)(%:&ointer to data in the 53 segment.
)+%:+ounter for string and loop operations.
)5%:I72 pointer.
)3I:&ointer to data in the segment pointed to by the 53 register6 source pointer
for string operations.
)5I:&ointer to data or destination/ in the segment pointed to by the )3 register6
destination pointer for string operations.
)3&:3tac8 pointer in the 33 segment/.
)(&:&ointer to data on the stac8 in the 33 segment/.
9s shown in Figure, the lower 1@ bits of the general4purpose registers map directly to the
register set found in the #@ and Intel $@ processors and can be referenced with the
names 9%, (%, +%, 5%, (&, 3&, 3I, and 5I. )ach of the lower two bytes of the )9%,
)(%, )+%, and )5% registers can be referenced by the names 9>, (>, +>, and 5>
high bytes/ and 9, (, +, and 5 low bytes/.
Tom (utlerC1
8/12/2019 Hardware Fundamentals(1)
32/44
Segment )egisters
The segment registers +3, 53, 33, )3, F3, and J3/ hold 1@4bit segment selectors. 9
segment selector is a special pointer that identifies a segment in memory. To access a
particular segment in memory, the segment selector for that segment must be present inthe appropriate segment register. hen writing application code, you generally create
segment selectors with assembler directives and symbols. The assembler and other tools
then create the actual segment selector values associated with these directives and
symbols. If you are writing system code, you may need to create segment selectors
directly.
>ow segment registers are used depends on the type of memory management model that
the operating system or executive is using. hen using the flat unsegmented/ memory
model, the segment registers are loaded with segment selectors that point to overlapping
segments, each of which begins at address # of the linear address space as shown in
Figure/. These overlapping segments then comprise the linear4address space for the
program. Typically, two overlapping segments are definedH one for code and another for
data and stac8s. The +3 segment register points to the code segment and all the other
segment registers point to the data and stac8 segment./
hen using the segmented memory model, each segment register is ordinarily loaded
with a different segment selector so that each segment register points to a different
segment within the linear4address space as shown in Figure
8/12/2019 Hardware Fundamentals(1)
33/44
Figure 11 .se of Segment )egisters in Segmented %emory %odel
)ach of the segment registers is associated with one of three types of storageH code, data,
or stac8/. For example, the +3 register contains the segment selector for the code
segment, where the instructions being executed are stored. The processor fetches
instructions from the code segment, using a logical address that consists of the segment
selector in the +3 register and the contents of the )I& register. The )I& register contains
the linear address within the code segment of the next instruction to be executed. The +3
register cannot be loaded explicitly by an application program. Instead, it is loaded
implicitly by instructions or internal processor operations that change program control
such as, procedure calls, interrupt handling, or tas8 switching/.
The 53, )3, F3, and J3 registers point to four data segments. The availability of four
data segments permits efficient and secure access to different types of data structures. For
example, four separate data segments might be createdH one for the data structures of the
current module, another for the data exported from a higher4level module, a third for a
dynamically created data structure, and a fourth for data shared with another program. To
Tom (utlerCC
8/12/2019 Hardware Fundamentals(1)
34/44
access additional data segments, the application program must load segment selectors for
these segments into the 53, )3, F3, and J3 registers, as needed.
The 33 register contains the segment selector for a stac* segment, where the procedure
stac8 is stored for the program, tas8, or handler currently being executed. 9ll stac8operations use the 33 register to find the stac8 segment. 'nli8e the +3 register, the 33
register can be loaded explicitly, which permits application programs to set up multiple
stac8s and switch among them.
The four segment registers +3, 53, 33, and )3 are the same as the segment registers
found in the Intel #@ and Intel $@ processors and the F3 and J3 registers were
introduced into the Intel 9rchitecture with the IntelC@ family of processors.
4F-A0S )egister
The C$4bit )F9J3 register contains a group of status flags, a control flag, and a group
of system flags. Figure C4= defines the flags within this register. Following initialization
of the processor either by asserting the -)3)T pin or the I!IT pin/, the state of the
)F9J3 register is #######$>. (its 1, C, A, 1A, and $$ through C1 of this register are
reserved. 3oftware should not use or depend on the states of any of these bits.
3ome of the flags in the )F9J3 register can be modified directly, using special4purpose
instructions described in the following sections/. There are no instructions that allow the
whole register to be examined or modified directly. >owever, the following instructions
can be used to move groups of flags to and from the procedure stac8 or the )9% registerH
9>F, 39>F, &'3>F, &'3>F5, &2&F, and &2&F5. 9fter the contents of the
)F9J3 register have been transferred to the procedure stac8 or )9% register, the flags
can be examined and modified using the processors bit manipulation instructions (T,
(T3, (T-, and (T+/.
hen suspending a tas8 using the processors multitas8ing facilities/, the processor
automatically saves the state of the )F9J3 register in the tas8 state segment T33/ for
the tas8 being suspended. hen binding itself to a new tas8, the processor loads the
)F9J3 register with data from the new tas8s T33.
Tom (utlerC"
8/12/2019 Hardware Fundamentals(1)
35/44
hen a call is made to an interrupt or exception handler procedure, the processor
automatically saves the state of the )F9J3 registers on the procedure stac8. hen an
interrupt or exception is handled with a tas8 switch, the state of the )F9J3 register is
saved in the T33 for the tas8 being suspended.
Instruction Pointer
The instruction pointer )I&/ register contains the offset in the current code segment for
the next instruction to be executed. It is advanced from one instruction boundary to the
next in straightline code or it is moved ahead or bac8wards by a number of instructions
when executing *&, *cc, +9, -)T, and I-)T instructions.
The )I& register cannot be accessed directly by software6 it is controlled implicitly by
controltransfer instructions such as *&, *cc, +9, and -)T/, interrupts, and
exceptions. The only way to read the )I& register is to execute a +9 instruction and
then read the value of the return instruction pointer from the procedure stac8. The )I&
register can be loaded indirectly by modifying the value of a return instruction pointer on
the procedure stac8 and executing a return instruction -)T or I-)T/.
9ll Intel 9rchitecture processors prefetch instructions. (ecause of instruction
prefetching, an instruction address read from the bus during an instruction load does not
match the value in the )I& register. )ven though different processor generations usedifferent prefetching mechanisms, the function of )I& register to direct program flow
remains fully compatible with all software written to run on Intel 9rchitecture processors.
Operandsi7e and Addresssi7e Attri&utes
hen processor is executing in protected mode, every code segment has a default
operand4size attribute and address4size attribute. These attributes are selected with the 5
default size/ flag in the segment descriptor for the code segment. hen the 5 flag is set
the C$4bit operand4size and address4size attributes are selected6 when the flag is clear, the
1@4bit size attributes are selected. hen the processor is executing in real4address mode,
virtual4#@ mode, or 3, the default operand4size and address4size attributes are
always 1@ bits.
Tom (utlerCA
8/12/2019 Hardware Fundamentals(1)
36/44
The operand4size attribute selects the sizes of operands that instructions operate on.
hen the 1@4bit operand4size attribute is in force, operands can generally be either bits
or 1@ bits, and when the C$4bit operand4size attribute is in force, operands can generally
be bits or C$ bits. The address4size attribute selects the sizes of addresses used to
address memoryH 1@ bits or C$ bits. hen the 1@4bit address4size attribute is in force,
segment offsets and displacements are 1@4bits. This restriction limits the size of a
segment that can be addressed to @" E(ytes. hen the C$4bit address4size attribute is in
force, segment offsets and displacements are C$4bits, allowing segments of up to "
J(ytes to be addressed. The default operand4size attribute and7or address4size attribute
can be overridden for a particular instruction by adding an operand4size and7or address4
size prefix to an instruction. The effect of this prefix applies only to the instruction it is
attached to.
Pentium II
The &entium II incorporates many of the salient features of the &entium &ro and &entium
%6 however, its physical pac8age was based on the 3)++73lot 1 interface and its A1$
E( $ cache ran at only half the processor internal cloc8 rate. First generation &entium II
Elamath +&'s operated at $CC, $@@, C## and CCChz with a F3( of @@hz and a core
voltage of $. 0olts. In 1z, F3( and at $.# 0olts
at the core. Its ma?or improvements wereH
1@ Eb 1 instruction and data caches
$ cache with non4proprietary commercially available 3-9
Improved 1@ bit capability through segment register caches
% unit.
3tandard &entium II could only be used in dual multiprocessor configurations6
however, &entium %)2! cpus had up to $ ( of $ cache and could be used in
multiprocessor configurations of up to " processors.
Tom (utlerC@
8/12/2019 Hardware Fundamentals(1)
37/44
Celeron
The +eleron began as a scaled down version of the &entium II and was designed to
compete against similar offerings from Intels competitors. The Elamath4based
+ovington core ran at $@@ and C## >z and were constructed without an $ cache.
>owever, adverse mar8et reaction saw the 5eschutes4based endocino core introduced
with an 1$ Eb $ cache and ran at C##, CCC, "##, "CC, "@@, A## and ACC >z. +elerons
have the same 1 cache as their bigger brothers:&entium II and III. The important
distinction is that the $ cache operates at full +&' cloc8 rates, unli8e the &entium II and
the 3)++ pac8aged &entium III. ater variants of the &entium III had an on4die $
cache which ran at full +&' cloc8 rate. The +eleron III +oppermine1$ core/has the
same internal features as the &entium III, but has reduced functionalityH @@ hz cloc8
rate, no error correction codes for the data bus, and parity creation for the address bus,
and a maximum of " J( of address space. +eleron III +oppermine1$s with a 1.@ 0
core and a 1## >z were produced in $##1 and operated at core speeds of up to 1.1
hz. Tualatin4core +elerons were put on the mar8et in late $##1 and ran at 1.$ J>z.
$##$ saw the final versions produced running aty 1.C and 1." >z.
Pentium III
The only significant difference between the &entium III and its predecessor was the
inclusion of =$ % instructions, 8nown as the Internet 3treaming 3ingle Instruction
ultiple 5ata )xtensions I33)/, they include integer and floating point operations.
>owever, li8e the original % instructions, application programmers must include the
corresponding extensions if any use is to be made of these instructions. The most
controversial and short4lived addition was the +&' I5 number which could be used for
software licensing and e4commerce. 9fter protest from various sources, Intel disabled it
as default, but did not remove it. 5epending on the (I23 and motherboard manufacturer,
it may remain as such but it can be enabled via the (I23. In reality, &entium III
performance was based. The three variants of &entium III were the were the Eatami,
+oppermine, and Tualatin. Eatami first introduced the I33) %7$/ as described with
an F3( of 1## >. The +oppermine also introduced 9dvanced Transfer +ache 9T+/
for the $ cache which reduced cache capacity to $A@ E( but saw the cache run at full
processor speed. 9lso the @"4bit Eatami cache bus was ;uadrupled to $A@ bits.
Tom (utlerC=
8/12/2019 Hardware Fundamentals(1)
38/44
+oppermine also uses an 4way set associative cache, rather than the "4way set
associative cache in the Eatami and older &entiums. (ringing the cache on4die also
increased the transistor count to C# million, from the 1# million on the Eatami. 9nother
advance in the +oppermine was 9dvanced 3ystem (uffering 93(/, which simply
increased the number of buffers to account for the increased F3( speed of 1CC >z. The
&entium III Tualatin had a reduced die size that allowed it to run at higher speeds.
Tualatins use a 1CC>z F3( and have 9T+ and 93(.
Pentium I9: The ;e5t 0eneration
The release of the &entium I0 in $### heralded the seventh generation of Intel
microprocessors. The release was premature, however, due to the out performance of the
&entium III +oppermine, with its 1 Jhz performance threshold, by Intels ma?or
competitor the microprocessor mar8et, the 95 9thlon. Intel was not ready to answer
the competition through the early release of the next member of its &entium III family,
the &entium III Tualatin, which were designed to brea8 the 1 Jhz barrier. &revious
attempts to do so with the &entium III +oppermine 1.1C Jhz met with failure due to
design flaws. &aradoxically, however, Intel was in a position to release the first of the
&entium I0 family the illamette, which ran at 1.C, 1." and 1.A hz, using a F+4&J9
pac8age on the short4lived 3oc8et "$C, which was a design dead end for motherboard
manufacturers and consumers. orse still, the only Intel chipset available for the
&entium I0 could only house the highly expensive -ambus 5-9. In addition, the early
versions of &entium I0 +&' were outperformed by slower 95 9thlons. !evertheless,
the core capability of Intels seventh generation processors is that they can run at ever4
higher speeds. For example, Intels sixth generation &entiums began at 1$# hz with the
&entium &ro and ended at over 1.$ Jhz, a tenfold increase. The bottom line here is that
Intels seventh generation chips could end up running at speeds of 1# Jhz or more. >ow
has Intel achieved thisM Through a radical redesign of the &entiums core architecture.The following sections illustrate the ma?or advances.
The most visible feature seen of the new &entium I0 is the Front 3ide (us F3(/ which
initially operated at e;uivalent speed of "## hz as compared to 1## >z on the
&entium III. The &entium III has a @"4bit data bus that delivered a data throughput of
Tom (utlerC
8/12/2019 Hardware Fundamentals(1)
39/44
1.#@@ J( @"U 1CCG 1.#@@/. The &entium I0 F3( bus is also @"4bit wide, however, its
1## hz bus speed is D;uad4pumped giving an effective bus speed of "##hz and a data
transfer rate of C.$ J(. The newer as of late $##$/ &entium I07chipsets operate at 1CC
hz and deliver a bus speed of ACC hz and a bus speed of ".$ Jhz. Thus, the &entium
I0 exchange data with the i"A and iA# chipsets faster than any other processor, thus
removing the &entium IIIs most significant bottlenec8. IntelVs A# chipset for the
&entium I0 uses two -ambus channels to $4" -5-9 -Is. Together, these two
-5-9 channels are able to deliver the same data bandwidth as the &entium I0 F3(.
9s the later discussion on 5-9 indicates, similar transfer rates are delivered using the
i"A chipset and 55- 5-9. stellation enables &entium "4systems to have the highest
data transfer rates between processor, system and main memory, which is a clear benefit.
Advanced Transfer Cache
The first ma?or improvement is the integration of the $ cache and the evolution of the
9dvanced Transfer +ache introduced in the &entium III +oppermine which had ?ust $A@
E( of 1 +ache. The first &entium I0, the illamette, had a similar sized cache, but
could transfer data at " J( per second at a +&' cloc8 speed of 1.A Jhz into the +&'s
core logic, In comparison, the +oppermine could only transfer 1@ J(7s at 1 Jhz to its 1
Instruction +ache. !ote also that the Front 3ide (us speed of the &entium III was 1CC
hz, while the &entium I0 illamette had a F3( speed of "## hz. In addition, the
&entium I0 $ cache has 1$4byte cache lines, which are divided in two @"4byte
segments. For example, when the &entium I0 fetches data from the -9, it does so in
@" byte burst transfers. >owever, if ?ust four bytes C$ bits/ are re;uired this bloc8
transfer becomes inefficient. >owever, the cache has advanced 5ata &refetch ogic that
predicts the data re;uired by the cache and loads it into the $ cache in advance. The
&entium I0Vs hardware prefetch logic significantly accelerates the execution of processes
that operate on large data arrays. The read latency the time it ta8es the cache to transfer
data into the pipeline/ of &entium "Vs $4cache is = cloc8 pulses. >owever, its connection
to the core logic the Translation oo8aside buffer in this case, there is no I4+ache in the
&entium I0/ is $A@4bit wide and cloc8ed the full processor speed. The second member of
the &entium I0 family was the !orthwood, which had a A1$ E( $ +ache running at the
processors cloc8 speed.
Tom (utlerC
8/12/2019 Hardware Fundamentals(1)
40/44
L1 Data CacheThe second ma?or development in cache technology is that the &entium I0 has only one
1 E( data cache. In place of the 1 instruction cache I4+ache/ in the @ thgeneration
&entiums it has a much more efficient )xecution Trace +ache.
Intel reduced the size of its 1 data cache to enable a very low latency of only $ cloc8
cycles. This results in an overall read latency the time it ta8es to read data from cache
memory/ of less than half of the &entium IIIVs 1 data cache.
7thGeneration NetBurst Micro-Architecture
Intels !et(urst icro49rchitecture provides a firm foundation for future advances in
processor performance, particularly where speed of operation is concerned. The !et(urst
micro4architecture has four ma?or componentsH >yper &ipelined Technology, -apid
)xecution )ngine, )xecution Trace +ache and a "##>z system bus. 9lso incorporated
are four significant improvements over sixth generation architectureH 9dvanced 5ynamic
)xecution, 9dvanced Transfer +ache, )nhanced Floating &oint W ultimedia 'nit, and
3treaming 3I5 )xtensions $.
Hyper Pipelined Technology
The traditional approach to increasing a +&'s cloc8 speed was ma8e smaller processors
by shrin8ing the die. 9n alternative strategy evident in -I3+ processors is to ma8e the
+&' more efficient do less per cloc8 cycle and have more of them. To do this in a +I3+4
based processor, Intel simply increased the number of stages in the processors pipeline.
The upshot of this is that less is accomplished per cloc8 cycle. This is a8in to a Dbuc8et4
brigade passing smaller buc8ets rapidly down a chain, rather than larger buc8ets at a
slower rate. For example, the ' and 0 integer pipelines in the original &entium each had
?ust five stagesH instruction fetch, decode 1, decode $, execute and write4bac8. The
&entium &ro introduced a &@ architecture with a pipeline consisting of 1# stages. The &=
!et(urst micro4architecture in the &entium I0 increased the number of stages to $#.
This, Intel terms its >yper &ipelined Technology.
Enhanced Branch Prediction
The 8ey to pipeline efficiency and operation is effective branch prediction, hence the
much improved branch prediction logic in the &entium I0s 9dvanced 5ynamic
Tom (utler"#
8/12/2019 Hardware Fundamentals(1)
41/44
8/12/2019 Hardware Fundamentals(1)
42/44
point operations, which are not prone to the same type of branch prediction inefficiencies
as integer4based instructions.
Streaming SIMD Extensions 2
In the follow4up to Intels 3treaming 3I5 3ingle Instruction ultiple 5ata/ )xtensions33)/. 3I5 is a technology that allows a single instruction to be applied to multiple
datasets at the same time. This is especially useful when processing C 5 graphics. 3I54
F& Floating &oint/ extensions help speed up graphics processing by ta8ing the
multiplication, addition and reciprocal functions and apply them to the multiple datasets
simultaneously. -ecall, 3I5 first appeared with the &entium % which incorporated
A= % instructions. These are essentially 3I54Int integer/ instructions. Intel first
introduced 3I54F& extensions in the &entium III with =$ 3treaming 3I5 )xtensions
33)/. Intel introduced 1"" new instructions in the &entium I0 that enable it to handle
two @"4bit 3I54I!T operations and two double precision @"4bit 3I54F& operations.
This is contrast to the two C$4bit operations the &entium % and III under 33)/
handle. The ma?or benefit of 33)$ is enhanced greater performance, particularly with
3I54F& instructions, as it increases the processors ability to handle greater precision
floating point calculations. 9s with % and 33), these instructions re;uire software
support.
Celeron IV
The +eleron I0 first appeared in $##$, these were based on the &entium I0 and could be
accommodated on the 3oc8et "= motherboards. (ased on the illamette, the $ was
halved to 1$ E( and ran at 1.= J>z. ater models ran at 1., 1.< and $ J>z. The next
member was based on the !orthwood and had $A@ E( $ cache. (ased on the i"A
chipset, the new +elerons are now good value entry level processors.
Additional )esources
The following 5iagrams of the &entium III, I0 and 95 9thlon +&'s are provided to
highlight the architectural features of these microprocessors and enhance the foregoing
text. The following figures have been obtained from Toms >ardware Juide !2T this
Tom/H further insights into the Intel architectures may be found atH
[email protected]$###11$#7index.html/.
Tom (utler"$
8/12/2019 Hardware Fundamentals(1)
43/44
Tom (utler"C
8/12/2019 Hardware Fundamentals(1)
44/44
Top Related