Detailed Architecture - James Madison University€¦ · Web viewHistory, Features, and...
Transcript of Detailed Architecture - James Madison University€¦ · Web viewHistory, Features, and...
PA-RISC 2.0 64-Bit Processors: History, Features, and Architecture
CS-350Section 2
Spring 2004
By: Joshua MadaganAdam Gray
Christie Kummers
Table of Contents
Computing in 64-bit……………………………………………………………………………….3The Dawn of the 64-bit chip………………………………………………………………3Differences between 32-bit and 64-bit chips……………………………………………...3Benefits of switching to 64-bit…………………………………………………………….3
History of PA-8x00 Processors……………………………………………………………………4PA-8000…………………………………………………………………………………...4PA-8200…………………………………………………………………………………...5PA-8500…………………………………………………………………………………...5PA-8600…………………………………………………………………………………...5PA-8700…………………………………………………………………………………...5PA-8800…………………………………………………………………………………...5
Features and Detailed Architecture…………………...…………………………………………..6RISC……………………………………………………………………………………….664-bit Computing………………………………………………………………………….6Out of Order (OoO) Execution……………………………………………………………6Branch Prediction………………………………………………………………………….74-Way Superscalar Execution……………………………………………………………..7Cache System……………………………………………………………………………...7Physical Architecture……………………………………………………………………...9
Conclusion……………………………………………………………………………………….11Appendix…………………………………………………………………………………………12Works Cited……………………………………………………………………………………...16
2
Computing in 64-bit
With the growth of today’s technology, anything and everything dealing with computers is constantly becoming faster and more powerful. A desktop system can be bought today and in about three months, the technology used on the system can be replaced. To meet the growing demand of more power and better performance, sometimes drastic measures have to be taken. In this case we are going from a 32-bit processor to a 64-bit processor.
The Dawn of the 64-bit chip
64-bit computing is the next step towards advancing technology in computers. Hewlett-Packard broke onto the scene of 64-bit computing in late 1995 early 1996 with the release of their PA-RISC 8000 for large scale servers. Currently, HP markets its 64-bit chip for servers only and runs mainly on Linux. However, for HP to compete with the market place, they debuted their original workstation at a considerably low price for the company, just under $25,000. The reason for the bargain price was due to pressure from competitors. Due to HP’s committed performance, they managed to attract a very promising customer, the United States Army. The Army signed with HP when the PA-8000 was first released (Hayes, 1996).
Differences between 32-bit and 64-bit chips
When it comes down to comparing the 32-bit and 64-bit chips, there is an extremely noticeable difference. With today’s Intel based processors and AMD chips, the computer can address up to four gigabytes of memory. In Windows-based machines though that memory is divided between the operating system and applications; thus the most memory that any one program can use is two gigabytes. On the other hand, a 64-bit central processing unit (CPU) can handle more memory and larger files. The 64-bit processor can handle around sixteen exabytes of memory, which is over sixteen billion gigabytes. This gives the computer a larger address space thus allowing more memory to be addressed (Mainelli, 2003).
In order to best achieve the true 64-bit effect, you need a 64-bit operating system that would use 64-it addressing and arithmetic capabilities of the CPU. With the use of the 64-bit processor, it provides more resources to the system and programs. 64-bit processors come in handy when the system is called upon to perform integer arithmetic. The use of sixty-four bits provides better performance and precision than thirty-two bits does. Even today, most system compilers support the 64-bit feature on the 32-bit CPU; resulting in increased performance on larger data types (Hewlett-Packard, 2004).
If an application were to run on a 64-bit system that does not require any 64-bit features, the program should remain as a 32-bit program. On HP’s systems though, 32-bit applications can run on both thirty-two and sixty-four bit systems, saving the user money on multiple versions of the same program. HP stated that a 32-bit application would run seventy to one hundred percent faster on its original PA-8000 which is 64-bit than a 32-bit system (Foley, 1996). Since the application would need to be recompiled for the 64-bit, the file will then be larger than before, and system performance could decline because of the number of cache misses while the program
3
is running in 64-bits (Hewlett-Packard, 2004 and Jacobs, 1998). However, if a client wanted to run their applications on 64-bits, as long as they were to follow the instructions supplied by their vendor, recompiling the programs would be the only main issue that would arise (Garvey, 1998).
Benefits of switching to 64-bit
The benefits that can be reaped from 64-bit computing are impeccable; it is all about speed (Foley, 1996). When 64-bit computing is adapted into any environment, it will give the user more powerful hardware and increased application performance (Jacobs, 1998). Currently 64-bit technology that is manufactured by HP is used for storing large amounts of data more than anything else, and is common in data warehouses and similar database work (Jacobs, 1998). Some applications do not fit onto 32-bit machines, causing systems to store data on multiple files instead of one. By placing such a large application onto a 64-bit machine, the system will output at a higher performance level. With more memory and address space, there is less swapping and searches can be increased by a considerable amount (Hewlett-Packard, 2004)1.
Besides being well-suited for working with databases, 64-bit processors will also be a driving force with programs that are graphic-intensive. A 64-bit system would be more capable of handling large files with are usually associated with graphics and movie files. This would be a good match for any real-time multimedia programs that are on the World Wide Web (Foley, 1996). With extra speed added to the processor, programmers can add an incredible amount of detail to games. This will lead to more realistic sounds, environments, and better textures. The characters would be more detailed and more human representation of their features. Even computer run characters would have a more realistic playing mode. Until 64-bit technology is adapted at every level of the system architecture, the full benefits of 64-bit technology cannot be fully comprehended (Garvey, 1998). 64-bit computing will also benefit computer-aided design (CAD), three-dimensional simulation, business modeling, and semiconductor design. The 64-bit system could also be adapted to our nation’s line of defense and high-end decision support systems (Phan, 2002).
History of PA-8x00 Processors
PA-8000
The PA-8000 was originally introduced in January of 1996. It was the first chip to use the 64-bit PA-RISC 2.0 Architecture, meaning that all integer and registers and functional units have been widened to 64-bit. It also allowed for faster translation from virtual addresses to physical addresses. The PA-8000 was given an Instruction Recorder Buffer, allowing the CPU to perform its own instruction scheduling. The PA-8000 was also equipped with duel floating-point and duel load/store units and no on-chip caches. All caches were made off chip so that more data could be accessed per cycle. This made the latency nearly two cycles but with completely pipelining it could be made closer to one cycle. The PA-8000, and all PA-8x00 after it, also performed
1 See the Appendix for a table summarizing the sources of increase in performance and scalability that is related to 64-bit computing by the type of application.
4
speculative execution. This means that the processor would try to guess what instructions were coming up and would prepare for them accordingly. When it is time to actually perform the instruction, the predicted outcome and the actual outcome are compared. If they do not match the predicted outcome is thrown out. The principal is that if things go as the computer predicts, then it can follow its instructions much quicker (Weissmann, 1999-2004).
PA-8200
This version of the PA-8x00 was introduced in May of 1997. It was essentially just an upgraded version of the PA-8000. It had improved performance including 4Mb SRAMs with faster access time, allowing for a larger cache size. The Translation Lookaside Buffer (TLB) and Branch History Table (BHT) were also increased to reduce “wasted cycles” (Weissmann, 1999-2004).
PA-8500
The PA-8500 was introduced in September of 1998. Once again the chip was made bigger and faster. A major change was that the L1 cache was integrated with the CPU die. Once again the TLB and BHT were increased. The PA-8500 was also made able to handle two memory operations at the same time. This was accomplished by using the same dual bank used for the PA-8000’s off-chip data cache. All data caches on the PA-5000 are 0.5 MB and are implemented as four .125 MB arrays, each with a double-word of data. The instruction cache is a .5 Mb four-way set associative pipeline cache with 128 bits of instruction (Weissmann, 1999-2004).
PA-8600
In January of 2000, the PA-8600 was introduced. Only minor modifications were made between the PA-8500 and PA-8600. The only real changes were a higher clock speed, modifications to the interface bus, rework on the bus transactions, and the addition of a quasi (Least Recently Used) LRU replacement policy for the instruction cache (Weissmann, 1999-2004).
PA-8700
The PA-8700 was introduced in August of 2001. Again, the PA-8700 was just another upgrade from the PA-8500. The on-chip L1 cache and TLB were enhanced significantly and a new CMOS-process helped boost the clock frequency (Weissmann, 1999-2004).
PA-8800
The PA-8800 was introduced in October of 2001. No modifications were made between the PA-8800 and the PA-8700. Rather, the PA-8800 was only two PA-8700 cores put together on the same chip. This allowed the chip’s core speed to run up to 1 GHz and allowed for a combined 35 MB L1+L2 cache (LostCircuits, 2001).2
2 See the appendix for a table that lists all of the HP PA-8x00 processors and the features that they each posses.
5
Features and Detailed Architecture
The basic architecture of the PA-8x00 64-bit processors has not changed much since the original PA-8000 was released. The standards set in the PA-RISC 2.0 specifications are implemented in all PA-8x00 chips. These include 64-bit data and address extensions, branch prediction, the use of a 4-way superscalar system, and out-of-order (OoO) execution, among other things (Hewlett Packard, 2000). However, several key items have been changed and improved during the lifecycle of the PA-8x00 family. The location of L1 cache has been changed during the process, components have been added, and paths widened (Weissmann, 1999-2004). The PA-RISC 64-bit processors have quite a few interesting features.
RISC
The PA-8x00 series of processors is a Reduced Instruction Set Computer (RISC). The instruction set is directly implemented in hardware, without the use of microcode. Hardware implemented instructions are performed in one clock cycle, while microcode instructions can take several cycles. The PA-RISC 2.0 processors also use a fixed instruction size of 32-bits. These instructions can easily be divided into parts, allowing for easier pipelining (Kane, 1996).
64-Bit Computing
The PA-8x00 processors are true 64-bit computers. All of the integer registers, Arithmetic Logic Units (ALUs), and shift and merge units are 64-bits wide. The address space can theoretically be up to 64-bits wide, but the PA-8000, PA-8200, PA-8500, and PA-8600 only support 40-bit addresses, while the PA-8700 and PA-8800 support 44-bit addresses. This allows for 1TB and 16TBs of memory, respectively. This all combines to allow the PA-8x00 processors to access huge amounts of data quickly and operate on increasingly large numbers (Weissmann, 1999-2004).
Out of Order (OoO) Execution
The PA-8x00 processors have the ability to schedule their own instructions (Hunt). This allows the processor to find instructions than can be executed simultaneously, and therefore make better use of the multiple execution units of the processor. The Instruction Reorder Buffer (IRB) stores a maximum of 28 computational instructions and 28 load/store instructions and determines which instructions can be executed (Weissmann, 1999-2004). This results in instructions not necessarily being executed in program order. Branch prediction also comes into play here, as later instructions can be executed before earlier ones have finished, and may have based calculations on incorrect data. Out of order execution provides increased performance through the constant use of pipelining, as instructions are constantly being fed to the execution units (Hunt).
6
Branch Prediction
The PA-8x00 implements branch prediction in order to keep pipelines full while executing instructions that change the flow of control in programs. Conditional and looping statements create instances where the control could be passed to different places, depending on the outcome of the statement (Downey, 2000). To combat this, the PA-8x00 implements both static and dynamic branch prediction (Hewlett Packard, 2000). The processor guesses the outcome of a branch based on the code itself and on the history of the branch, if any, found in the BHT. Assuming that the prediction is correct, the processor continues from the branch until it receives the true result. If it finds that the result matches the prediction, the program continues on, having saved itself some cycles. However, if the result does not match the prediction, the provisionally executed instructions are thrown out and the IRB reverts back to the branch and continues feeding instructions from there, with the correct result (Hewlett Packard, 2000). Branch prediction provides increased performance through aiding OoO execution in keeping instructions flowing to the multiple execution units.
4-Way Superscalar Execution
The PA-8x00 processors implement a 4-way superscalar system. This means that the processor can execute 4 different instructions per clock cycle (Webopedia, 2001). Superscalar systems work by using multiple execution units so that long execution times do not waste cycles in the fetch, decode, and save stages (Downey, 2000). Superscalar execution requires a constant flow of data and instructions to see the benefits of simultaneous execution. This is achieved through a combination of advanced instruction scheduling algorithms, the ability to process instructions out of order, and branch prediction.
Cache System
The cache system has not remained the same throughout the PA-8x00 chips. All of the chips have used separate data and instruction L1 caches, as specified in the Harvard Architecture, but the placement of these has changed. The PA-8000 and PA-8200 both make use of off-chip L1 caches (Weissmann, 1999-2004). The PA-8x00 series of chips was designed to be high performance, and delivering this performance required a larger cache with more bandwidth than could be included on the chip in the mid 1990s (Hunt). These off-chip caches, one for data and one for instructions, could be up to 2MB each for the 8200. The caches are direct-mapped and dual-ported (Weissmann, 1999-2004). Direct-mapping means that each block of memory, when pulled into the cache, can be mapped to only one block in the cache. This is cheaper for HP to implement, but also low performance (Null & Lobur, 2003). Dual-ported cache can feed data to two load/store units at the same time, which increases performance.
Starting with the PA-8500, HP moved the L1 caches onto the chip. The caches were moved on-chip because they are cheaper and use fewer I/O resources than the previously used off-chip RAM chips (Hunt et al.). These on-chip caches are 4-way set associative, meaning that cache is divided into sets of 4 blocks each. This allows the system to map memory blocks to any of the 4 blocks in the set that a memory block is going to, which means that blocks are replaced less often
7
than with direct-mapping (Null & Lobur, 2003). These caches are only single-ported though, as this saves space on the processor core (Weissmann, 1999-2004). The L1 caches of the PA-8500, PA-8600, PA-8700, and PA-8800 are all arranged in the same way, but increase in size with later models. This diagram shows the PA-8700 L1 cache organization (Hewlett Packard, 2000).
The L1 data cache is separated into odd and even double-word caches, each with a 64-bit pipe out. Both of these caches are divided into four arrays of equal size. This allows for the 4-way set associative nature of the cache. The tags for each line of cache are held in the tag array. The L1 instruction cache is also divided into four arrays of equal size, with 2 smaller arrays in each. These 4 arrays all have 128-bit pipes to the multiplexer, which can then transmit 4 instructions per cycle to the Instruction Fetch Unit (IFU), assuming one instruction comes from each array (Hewlett Packard, 2000).
The PA-8800 also implements a 32MB off-chip L2 cache, which is shared by the two logic cores. This, combined with the 1.5MB of L1 data cache and 1.5MB of L1 instruction cache, gives the PA-8800 the ability to store even relatively large applications in just cache (Lostcircuits, 2001).
8
Physical Architecture
Physically, the Architecture of the PA-8x00 processors has not changed much since the introduction of the PA-8000. Here is a block diagram of the PA-8000 (Hunt).
The architecture of the PA-8000 is fairly simple. It contains two 64-bit integer ALUs, two floating point units (FPUs), two load/store units, two shift/merge units, and two divide/square root units (Hunt). Both of the FPUs have thirty-two 64-bit registers, and the ALUs can take advantage of thirty-two 64-bit registers, as well (Gwennap, 1994). The IFU can fetch up to 4 instructions per cycle, contains the BHT and Branch Target Address Cache (BTAC), and connects to the L1 instruction cache, the system bus/memory, and the sort unit. The BTAC stores addresses for predicted branches, and works with the BHT in branch prediction. The sort unit controls the flow of instructions into the two buffers. The 28-entry ALU and memory buffers create the 56-entry IRB, used to perform OoO execution. These feed instructions to the processing units (Hunt). The rename registers hold the results from the processing units and from L2 data cache for use in other pending instructions (Gwennap, 1994). Once all of the preceding instructions have been completed, an instruction is retired. This process clears the instruction from the IRB and moves the result from the rename registers to the architected registers (Hunt).
9
The architecture did not change until the PA-8700. Here is a block diagram of the PA-8700 (Johnson, 2001).
The only real difference between the architecture of the PA-8000 and that of the PA-8700 is the use of on-chip L1 cache and the inclusion of an interface to off-chip L2 cache.
The PA-8800 is the latest member of the PA-8x00 family of processors. Here is a block diagram of the PA-8800 (Lostcircuits, 2001).
10
The PA-8800 architecture is simply two PA-8700 processors on the same chip, linked together and linked with an off-chip 32MB L2 cache (Lostcircuits, 2001).
ConclusionIn conclusion, 64-bit computing has come a long way. It provides great performance increases over 32-bit computing. Hewlett Packard's entry into 64-bit computing, the PA-RISC 2.0 architecture, has grown quite a bit since its inception. From the PA-8000 with its off-chip L1 cache to the dual-core PA-8800, HP has developed a family of processors with competitive features and a unique architecture.
11
Appendix
Table 1: The following table summarizes the sources of increases in performance and scalability associated with 64-bit computing by type of application.
(Hewlett-Packard, 2004)
Example Sources of Performance & Scalability gains → Large Databases ● Larger memory allocation per user
● Many more users● Large file implementations● Reduced swapping
→ Decision support ● Direct addressing● Reduced swapping● Large file implementations
→ Technical ● Large process data space applications ● More available shared memory segments
● Reduced swapping● High-precision arithmetic
12
Table 2: This is a table that features what each PA-8x00 processor possesses
PA
-RIS
C V
2.0
64-
Bit
10 F
unct
ion
Uni
ts, 2
Inte
ger A
LUs,
2 S
hift/
Mer
ge U
nits
2 C
ompl
ete
Load
/Sto
re P
ipel
ines
, 2 F
P M
ultip
ly/A
ccum
ulat
e U
nits
, 2 F
P D
ivid
e/S
quar
e R
oot U
nits
4-w
ay s
uper
scal
ar
2 Ad
dres
s Add
ers
96-e
ntry
fully
-ass
ocia
tive
dual
-por
ted
TLB
TLB
Mis
s P
enal
ty o
f 61
Cyc
les
120-
entry
fully
-ass
ocia
tive
dual
-por
ted
TLB
160-
entry
fully
-ass
ocia
tive
dual
-por
ted
TLB
240-
entry
fully
-ass
ocia
tive
dual
-por
ted
TLB
32-e
ntry
BTA
C (b
ranc
h ta
rget
add
ress
cac
he)
42-e
ntry
BTA
C (B
ranc
h Ta
rget
Add
ress
Cac
he)
256-
entry
BHT
(Bra
nch
His
tory
Tab
le)
1024
-ent
ry B
HT (B
ranc
h H
isto
ry T
able
)
2048
-ent
ry B
HT (b
ranc
h hi
stor
y ta
ble)
dyna
mic
and
stat
ic b
ranc
h pr
edict
ion
mod
es
PA-8000 X X X X X X X X X XPA-8200 X X X X X X X X XPA-8500 X X X X X X X X XPA-8600 X X X X X X X X XPA-8700 X X X X X X X X XPA-8800 X X X X X X X X X
.
13
Table 2.2: This table is a continuation of table 2.
off-c
hip
L1 c
ache
s up
to 1
MB
I and
1M
B D,
real
ized
in sy
nchr
onou
s 6.7
ns (1
50M
Hz) l
ate-
write
1M
b SR
AMs,
one
cycle
late
ncy
off-c
hip
L1 c
ache
s up
to 2
MB
I and
2M
B D,
real
ized
in sy
nchr
onou
s 5ns
(200
MHz
) lat
e-wr
ite 4
Mb
SRAM
s, on
e cy
cle la
tenc
y
on-c
hip
L1 c
ache
s 0.5
MB
I and
1M
B D,
eac
h 4-
way
set a
ssoc
iativ
e
on-c
hip
L1 c
ache
s 0.7
5MB
I and
1.5
MB
D, e
ach
4-wa
y se
t ass
ocia
tive,
impl
emen
ted
in
inde
pend
ent 0
.75M
B ba
nks.
cach
es a
re d
irect
-map
ped
and
dual
-por
ted
32 o
r 64
Byte
cac
he li
ne si
ze
Data
cac
he p
refe
tchi
ng
Supp
orts
up
to 1
TB
of p
hysic
ally
add
ress
able
mem
ory
(40-
bit p
hysic
al a
ddre
sses
Supp
orts
up
to 1
6 TB
of p
hysic
ally
add
ress
able
mem
ory
(44-
bit p
hysic
al a
ddre
sses
)
56-e
ntry
inst
ruct
ion
queu
e/re
orde
r buff
er (I
RB)
each
inst
ruct
ion
inclu
des fi
ve p
rede
code
bits
Quas
i LRU
repl
acem
ent p
olicy
for t
he in
stru
ctio
n ca
che
Quas
i LRU
repl
acem
ent p
olicy
for b
oth
the
inst
ruct
ion
and
data
cac
he
bi-e
ndia
n su
ppor
t
Supp
ort f
or h
ardw
are
lock
-ste
ppin
g, i.
e. o
pera
ting
mul
tiple
chi
ps in
par
alle
l to
dete
ct fa
ults
Runw
ay sy
stem
/mem
ory
bus,
120M
Hz, 6
4-bi
t wid
e, fe
atur
ing
split
tran
sact
ions
and
gl
uele
ss m
ultip
roce
ssin
g. M
ax. t
hrou
ghpu
t of 9
60M
B/s
PA-8000 X X X X X XPA-8200 X X X X X XPA-8500 X X X X X PA-8600 X X X X X X PA-8700 X X X X X X X X PA-8800 X X X X X X X X
Table 2.3: This table is a continuation of table 2.
14
Run
way
sys
tem
/mem
ory
bus,
125
MH
z, 6
4-bi
t, D
DR
(dou
ble
data
rate
), ~2
GB
/s p
eak
band
wid
th
Run
way
sys
tem
/mem
ory
bus,
125
MH
z, 6
4-bi
t, D
DR
(dou
ble
data
rate
), ~2
GB
/s p
eak
band
wid
th
Up to
180
MHz
freq
uenc
y wi
th 3
.3V
core
vol
tage
Up to
300
MHz
freq
uenc
y wi
th 3
.3V
core
vol
tage
Up to
440
MHz
freq
uenc
y wi
th 2
.0V
core
vol
tage
Up to
~55
0MHz
freq
uenc
y wi
th 2
.0V
core
vol
tage
Up to
750
MHz
(875
MHz
on
the
PA-8
700+
) fre
quen
cy w
ith 1
.5V
core
vol
tage
17.7
x 1
9.6
mm
2 die
, 4'5
00'0
00 F
ETs,
0.5
micr
on, 5
-laye
r met
al C
MOS
pac
kage
d in
a
1,08
5-pi
n fli
p-ch
ip L
GA p
acka
ge
21.3
x 2
2.0
mm
2 di
e, 1
40'0
00'0
00 F
ETs
, 0.2
5 m
icro
n, 5
-laye
r met
al C
MO
S p
acka
ged
in a
544
-pin
LG
A p
acka
ge
16.0
x 1
9.0
mm
2 die
, 186
'000
'000
FET
s, 0.
18 m
icron
, 7-la
yer S
OI C
MOS
pac
kage
d in
a
544-
pin
LGA
pack
age
SPEC
95 in
t/fp:
11.
8/20
.2
SPEC
95 in
t/fp:
15.
5/25
.0
SPEC
95 in
t/fp:
31.
8/47
.2
SPEC
2000
int/f
p: 1
25/1
53
SPEC
2000
int/f
p: 1
65/1
89
SPEC
2000
int/f
p: 3
38/3
57
PA-8000 X X X X PA-8200 X X X X PA-8500 X X X X XPA-8600 X X X PA-8700 X X X PA-8800 X X X
15
Works Cited
64-bit Computing re-examined. (2002, August). Network Magazine. http://www.networkmagazineindia.com/200208/focus2.shtml
Allison, Andrew. (1995, April). High-End Computing: Hope And Reality Of 64-Bit. InformationWeek. http://www.informationweek.com/524/24uwfw.htm
Bourekas, Phil (1999, January). 64-bit features give clout to 64-bit chips. EE Times. http://www.eetimes.com/article/showArticle/jhtml?/articleId+18300800
Downey, Tim (2000). “Branch Prediction.” http://www.cs.fiu.edu/~downeyt/cop3402/prediction.html
Foley, John. (1996, January). High-Speed Processors: Plugging In 64-Bit Chips. InformationWeek. http://www.informationweek.com/560/60ht64b.htm
Garvey, Martin. (1998, April). 64-Bit Computing Takes Off. InformationWeek. http://www.informationweek.com/678/78iubit.htm
Gwennapp, Linley (1994). “PA-8000 Combines Complexity and Speed.” The Insiders' Guide to Microprocessor Hardware. 14 Nov. 1994.
Hayes, Mary. (1996, June). HP 64-Bit Workstations Ready. InformationWeek. http://www.informationweek.com/582/82iubit.htm
Hewlett Packard (2000). “PA-RISC 8x00 Family Microprocessors with Focus on PA-8700.” URL: http://www.cpus.hp.com/technical_references/PA-8700wp.pdf
Hewlett-Packard (2004) What is 64-it computing? http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,989,00.html
Hunt, Doug. “Advanced Performance Features of the 64-bit PA-8000.” http://www.cpus.hp.com/technical_references/advperf.shtml
Hunt, D., Lesartre, G. “PA-8500: The Continuing Evolution of the PA-8000 Family.” http://www.cpus.hp.com/technical_references/8500.shtml
Jacobs, April. (1998, April). 64-bit computing. http://www.computerworld.com/hardwaretopics/hardware/story10,10808,43552,00.html
Johnson, David (2001). “HP’s Mako Processor.” http://www.cpus.hp.com/technical_references/mpf_2001.pdf
16
Kane, Gerry (1996). “PA-RISC 2.0 Architecture.” http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_1_overview.pdf
Lostcircuits (2001). “HP PA-8800 RISC Processor.” http://www.lostcircuits.com/cpu/hp_pa8800/
Mainelli, Tom. (2003, July). Are You Ready for a 64-Bit PC? PCWorld. http://www.pcworld.com/news/article/0,aid,111508,00.asp
McGee, Marianne Kolbasuk & Panettieri, Joseph C. (1995, December). The Push To 64-Bit Systems: Top vendors plan move to advanced microprocessors. InformationWeek. http://www.informationweek.com/558/58/iubit.htm
Null, L., Lobur, J. (2003). Computer Organization and Architecture. Sudbury, MA: Jones and Bartlett Publishers. QZ76.9.C643 N85 2003. ISBN 0-7637-0444-X.
Webopedia (2001). “What is superscalar?” http://www.webopedia.com/TERM/S/superscalar.html
Weissmann, Paul (1999-2004). “The OpenPA Project.” http://www.openpa.net
17