Scalar Operand Networks for Tiled Microprocessors
-
Upload
shufang-chi -
Category
Documents
-
view
49 -
download
2
description
Transcript of Scalar Operand Networks for Tiled Microprocessors
![Page 1: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/1.jpg)
Scalar Operand Networksfor Tiled Microprocessors
Michael Taylor
Raw Architecture Project
MIT CSAIL
(now at UCSD)
![Page 2: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/2.jpg)
Until 3 years ago – computer architects have beenusing the N-way superscalar to encapsulate the idealfor a parallel processor… - nearly “perfect” but not attainable
Superscalar
“PE”->”PE” communication Free
exploitation of parallelism
Implicit
Clean semantics Yes
scalable No
power efficient No
(hw scheduler or compiler)
(or VLIW)
![Page 3: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/3.jpg)
What’s great about superscalar microprocessors? It’s the networks!
Fast low-latency tightly-coupled networks (0-1 cycles of latency, no occupancy)-For the lack of a better name let’s call them Scalar Operand Networks (SONs) - Can we incorporate the benefits of superscalar communication + multicore scalability-Can we build Scalable Scalar Operand Networks?
(I agree with Jose: “We need low-latency tightly-coupled … networkinterfaces” – Jose Duato, OCIN, Dec 6, 2006)
mul $2,$3,$4
add $6,$5,$2
![Page 4: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/4.jpg)
The industry shift toward Multicore - attainable but hardly ideal
Superscalar Multicore
“PE”->”PE” communication Free Expensive
exploitation of parallelism
Implicit Explicit
Clean semantics Yes No
scalable No Yes
power efficient No Yes
![Page 5: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/5.jpg)
Superscalar Multicore
“PE”->”PE” communication Free Expensive
exploitation of parallelism
Implicit Explicit
Clean semantics Yes No
scalable No Yes
power efficient No Yes
What we’d like – neither superscalar nor multicore
Superscalarshave fastnetworksand greatusability
Multicorehas greatscalabilityand efficiency
![Page 6: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/6.jpg)
Why communication is expensive on multicore
Multiprocessor Node 1 Multiprocessor Node 2
Transport Cost
sendoverhead
receiveoverhead
sendoccupancy
sendlatency
receiveoccupancy
receivelatency
![Page 7: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/7.jpg)
Multiprocessor SON Operand Routing
Multiprocessor Node 1
sendoccupancy
sendlatency
Destination node nameSequence numberValueLaunch sequence
Commit LatencyNetwork injection
![Page 8: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/8.jpg)
Multiprocessor SON Operand Routing
Multiprocessor Node 2
receiveoccupancy
receivelatency
receive sequencedemultiplexingbranch mispredictions
injection cost
.. similar overheads for shared memory multiprocessors - store instr, commit latency, spin locks (+ attndt br. mispredicts)
![Page 9: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/9.jpg)
Defining a figure of merit forscalar operand networks
5-tuple <SO, SL, NHL, RL, RO>:
Send Occupancy
Send Latency
Network Hop Latency
Receive Latency
Receive Occupancy
Tip: Ordering follows timing of message from sender to receiver
We can use this metric to quantitativelydifferentiateSONs from existing multiprocessor networks…
![Page 10: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/10.jpg)
Impact of Occupancy (“o” = so+ro)
if (o * “surface area” > “volume”)
not worth it to offload: overhead too high
(parallelism too fine-grained)
Impact of Latency The lower the latency, the less work needed to keepmyself busy waiting for answer not worth it to offload: could have done it myself faster (not enough parallelism to hide latency)
Proc 0 Proc 1
noth
ing
to d
o
![Page 11: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/11.jpg)
The interesting region
Power4 <2, 14, 0, 14,4>(on-chip)
Superscalar < 0, 0, 0, 0, 0>(not scalable)
![Page 12: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/12.jpg)
Superscalar Multicore Tiled MulticorePE-PE communication Free Expensive Cheap
exploitation of parallelism
Implicit Explicit Both
scalable No Yes Yes
power efficient No Yes Yes
(w/ scalable SON)
Tiled Microprocessors (or “Tiled Multicore”)
![Page 13: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/13.jpg)
Tiled Microprocessors (or “Tiled Multicore”)
Superscalar Multicore Tiled MulticoreAlu-Alu communication Free Expensive Cheap
exploitation of parallelism
Implicit Explicit Both
scalable No Yes Yes
power efficient No Yes Yes
![Page 14: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/14.jpg)
Superscalar
CMP/multicore
Tiled
add scalable SON
add scalability
Transforming from multicore or superscalar to tiled
![Page 15: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/15.jpg)
The interesting region
Power4 <2, 14, 0, 14,4>(on-chip)
Raw < 0, 0, 1, 2, 0>Tiled “Famous Brand 2” < 0, 0, 1, 0, 0>
Superscalar < 0, 0, 0, 0, 0>(not scalable)
![Page 16: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/16.jpg)
Scalability Problems in Wide Issue Microprocessors
ControlWideFetch
(16 inst)
UnifiedLoad/Store
Queue
PC
RF
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALUBypass Net
![Page 17: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/17.jpg)
Area and Frequency Scalability Problems
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALUBypass Net
RF
~N3 ~N2 N ALUs
Ex: Itanium 2
Without modification, freq decreases linearly or worse.
![Page 18: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/18.jpg)
Operand Routing is Global
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALUBypass Net
RF
>>
+
![Page 19: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/19.jpg)
Idea: Make Operand Routing Local
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALUBypass Net
RF
![Page 20: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/20.jpg)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RF
Bypass Net
Idea: Exploit Locality
![Page 21: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/21.jpg)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RF
Replace the crossbar with a point-to-point, pipelined, routed scalar operand network.
![Page 22: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/22.jpg)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RF>>
+
Replace the crossbar with a point-to-point, pipelined, routed scalar operand network.
![Page 23: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/23.jpg)
Un-pipelinedcrossbarbypass
Point-to-PointRouted MeshNetwork
Local BW ~ N½ ~ N
Area ~ N2 ~ N
Operand Transport Scaling – Bandwidth and Area
We can route more operands per unit time if we are ableto map communicating instructions nearby.
Scalesas 2-DVLSI
For N ALUs and N½ bisection bandwidth:as in conventional superscalar
![Page 24: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/24.jpg)
Operand Transport Scaling - LatencyTime for operand to travel between instructions mapped todifferent ALUs.
Non-local Placement
~ N ~ N½
Locality- Driven Placement
~ N ~ 1
Un-pipelinedcrossbar
Point-to-PointRouted MeshNetwork
Latency bonus if we map communicating instructions nearby so communication is local.
![Page 25: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/25.jpg)
Distribute the Register File
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RF
RFRF RFRF
RFRF RFRF
RFRF RFRF
RFRF RFRF
![Page 26: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/26.jpg)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RFRF RFRF
RFRF RFRF
RFRF RFRF
RFRF RFRF
ControlWideFetch
(16 inst)
UnifiedLoad/Store
Queue
PC
SCALABLE
![Page 27: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/27.jpg)
More Scalability Problems
ControlWideFetch
(16 inst)
UnifiedLoad/Store
Queue
PC
![Page 28: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/28.jpg)
Distribute the rest: Raw – a Fully-Tiled Microprocessor
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RFRF RFRF
RFRF RFRF
RFRF RFRF
RFRF RFRF
Control
WideFetch
(16 inst)
UnifiedLoad/Store
Queue
PC I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$PC
D$I$
PC
D$I$
PC
D$I$
PC
D$
![Page 29: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/29.jpg)
Tiles!
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
RFRF RFRF
RFRF RFRF
RFRF RFRF
RFRF RFRF
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$
PC
D$
I$PC
D$I$
PC
D$I$
PC
D$I$
PC
D$
![Page 30: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/30.jpg)
Tiles!
![Page 31: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/31.jpg)
Tiled Microprocessors
-fast inter-tile communication through SON
-easy to scale (same reasons as multicore)
![Page 32: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/32.jpg)
1. Scalar Operand Network and Tiled Microprocessor intro
2. Raw Architecture + SON
3. VLSI implementation of Raw, a scalable microprocessor with a scalar operand network.
Outline
![Page 33: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/33.jpg)
Raw Microprocessor
Tiled scalable microprocessorPoint-to-point pipelined networks
16 tiles, 16 issue
Each 4 mm x 4mm tile:
MIPS-style compute processor - single-issue 8-stage pipe
- 32b FPU- 32K D Cache, I Cache
4 on-chip networks- two for operands- one for cache misses- one for message passing
![Page 34: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/34.jpg)
Fetch UnitInstruction
Cache
Generalized Transport Networks
Dynamic Router“GDN”
Dynamic Router“MDN”
FunctionalUnits
Execution Core
Inter-tileNetworkLinks
Compute Processor
Trusted
Core
Untrusted Core
Inter-tile SON
InstructionCache
Static Router
Switch ProcessorCross-
bar
Intra-tile SON
Data Cache
Raw Microprocessor Components
Cross-bar
![Page 35: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/35.jpg)
RFA TL
M1 M2
F P
E
U
r26
r27
r25
r24
Raw Compute Processor Internals
Ex: fadd r24, r25, r26
![Page 36: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/36.jpg)
Tile-Tile Communication
add $25,$1,$2
![Page 37: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/37.jpg)
Tile-Tile Communication
add $25,$1,$2 Route $P->$E
![Page 38: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/38.jpg)
Tile-Tile Communication
add $25,$1,$2 Route $P->$E Route $W->$P
![Page 39: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/39.jpg)
Tile-Tile Communication
add $25,$1,$2
sub $20,$1,$25
Route $P->$E Route $W->$P
![Page 40: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/40.jpg)
tmp3 = (seed*6+2)/3v2 = (tmp1 - tmp3)*5v1 = (tmp1 + tmp2)*3v0 = tmp0 - v1….
pval5=seed.0*6.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v3.10=tmp3.6-v2.7
v3=v3.10
v2.4=v2
pval3=seed.o*v2.4
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2.7=pval6*5.0
v2=v2.7
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp0=tmp0.1
v1.2=v1
pval2=seed.0*v1.2
tmp1.3=pval2+2.0
tmp1=tmp1.3
pval7=tmp1.3+tmp2.5
v1.8=pval7*3.0
v1=v1.8
v0.9=tmp0.1-v1.8
v0=v0.9
pval5=seed.0*6.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v3.10=tmp3.6-v2.7
v3=v3.10
v2.4=v2
pval3=seed.o*v2.4
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2.7=pval6*5.0
v2=v2.7
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp0=tmp0.1
v1.2=v1
pval2=seed.0*v1.2
tmp1.3=pval2+2.0
tmp1=tmp1.3
pval7=tmp1.3+tmp2.5
v1.8=pval7*3.0
v1=v1.8v0.9=tmp0.1-v1.8
v0=v0.9
RawCC assignsinstructions to the tiles, maximizing locality. It also generates the static routerinstructions that transferoperands between tiles.
Compilation
![Page 41: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/41.jpg)
One cycle in the life of a tiled micro
httpd
4-way automaticallyparallelizedC program
2-thread MPI app
DirectI/OstreamintoScalarOperandNetwork
mem
mem
mem
Zzz...
An application uses only as many tiles as needed to exploit the parallelism intrinsic to that application…
![Page 42: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/42.jpg)
Tile 0
Tile 5
Tile 9
Tile 12
Tile 8 Tile 13 Tile 14 Tile 11
Tile 4 Tile 1 Tile 2 Tile 7
Tile 3
Tile 6
Tile 10
Tile 15
One StreamingApplicationon Raw
very differenttraffic patternsthan RawCC-styleparallelization
![Page 43: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/43.jpg)
Splitter
FIRFilterFIRFilter FIRFilter FIRFilter
FIRFilterFIRFilter FIRFilter FIRFilter
Joiner
Splitter
Detector
Magnitude
FIRFilter
Vec Mult
Detector
Magnitude
FIRFilter
Vec Mult
Detector
Magnitude
FIRFilter
Vec Mult
Detector
Magnitude
FIRFilter
Vec Mult
Joiner
Splitter
FIRFilterFIRFilter
FIRFilterFIRFilter
FIRFilterFIRFilter
FIRFilterFIRFilter
Joiner
Splitter
Joiner
Vec MultFIRFilterMagnitudeDetector
Vec MultFIRFilterMagnitudeDetector
Vec MultFIRFilterMagnitudeDetector
Vec MultFIRFilterMagnitudeDetector
Original After fusion
Auto-Parallelization Approach #2: Streamit Language + Compiler
![Page 44: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/44.jpg)
FIRFilterFIRFilter
FIRFilterFIRFilter
FIRFilterFIRFilter
FIRFilterFIRFilter
Joiner
JoinerVec MultFIRFilterMagnitudeDetector
Vec MultFIRFilterMagnitudeDetector
End Results – auto-parallelized by MIT Streamitto 8 tiles.
![Page 45: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/45.jpg)
AsTrO Taxonomy: Classifying SON diversity
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
>>
+Assignment (Static/Dynamic)
Transport (Static/Dynamic)
Ordering (Static/Dynamic)
+
>>
Is instruction assignment to ALUs predetermined?
Are operand routes predetermined?
Is the execution order of instructions assigned to a node predetermined?
%&/
![Page 46: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/46.jpg)
Static Dynamic
Static
Static
Dynamic
DynamicStatic
RawDynRawScale
TRIPS
Static
Dynamic
Dynamic
ILDP WaveScalar
Assignment
Transport
Ordering
Microprocessor SON diversity using AsTrO taxonomy
![Page 47: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/47.jpg)
1. Scalar Operand Network and Tiled Microprocessor intro
2. Raw Architecture + SON
3. VLSI implementation of Raw, a scalable microprocessor with a scalar operand network.
Outline
![Page 48: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/48.jpg)
Raw Chips
October 02
![Page 49: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/49.jpg)
Raw16 tiles (16 issue)180 nm ASIC (IBM SA-27E)~100 million transistors1 million gates
3-4 years of development1.5 years of testing200K lines of test code
Core Frequency: 425 MHz @ 1.8 V 500 MHz @ 2.2 V
Frequency competitivewith IBM-implementedPowerPCs in same process.
18W average power
![Page 50: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/50.jpg)
Raw motherboard
Support Chipset implemented in FPGA
![Page 51: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/51.jpg)
![Page 52: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/52.jpg)
![Page 53: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/53.jpg)
A Scalable Microprocessor in Action
[Taylor et al, ISCA ’04]
![Page 54: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/54.jpg)
ConclusionsScalability problems in general purpose processors can be addressed by tiling resources across a scalable, low-latency, low-occupancy scalar operand network (SON). These SONs can be characterized by a 5-tuple and the AsTrO classification.
The 180 nm 16-issue Raw prototype shows the feasibility of the approach is feasible. 64+-issue is possible in today’s VLSI processes.
Multicore machines could benefit by adding inter-node SON for cheap communication.
![Page 55: Scalar Operand Networks for Tiled Microprocessors](https://reader036.fdocuments.net/reader036/viewer/2022062321/5681360c550346895d9d83cc/html5/thumbnails/55.jpg)
* * * *