Implementing a NoMC on the Gidel platform end-semester presentation
-
Upload
mallika-chetan -
Category
Documents
-
view
26 -
download
0
description
Transcript of Implementing a NoMC on the Gidel platform end-semester presentation
![Page 1: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/1.jpg)
1
Technion – Israel Institute of TechnologyDepartment of Electrical EngineeringHigh Speed Digital Systems Lab
Instructor: Evgeny FiksmanStudents: Meir Cohen
Daniel Marcovitch
Winter 2009
![Page 2: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/2.jpg)
2
Project goals Page 2
Previous router Page 5
Our routers Page 7
Software design Page 11
Obstacles Page 12
Testing Page 14
Time tables Page 16
Table of Contents
![Page 3: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/3.jpg)
Project goalsImplementing a parallel processing system
which contains several NoCs, each chip containing several sub-networks of processors.
Converting existing router to support Altera platform.
Expanding the router to enable communications between similar sub-networks.
Implementing a processor network which supports communication with the PC enabling: Use of PC’s CPU as part of the processing network. Simple I/O between PC and the rest of the processing
network.
3
![Page 4: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/4.jpg)
Top-level structure of the expanded network
Each white square represents a single FPGA on the Gidel board.
FPGA-FPGA, FPGA-PC routes go via designated routers (GW).
The GWs design/protocols are the same as the internal routers.
4
![Page 5: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/5.jpg)
Router from previous project
5
Cross Bar – Low Level
Clk Rst
Req
Des
t
Prem
it
Des
t
Pre
mit
Req
Dest
Premit
Req
Dest
Premit
Control B
us II
Control Bus II
Control Bus II
Permission Unit
Port
Controls3
Timer & Enable Unit
Control Bus I
Control Bus I
Data Bus 32 Bits
Data Bus 32 Bits
Data B
us
Data B
us
2
Bus I Interface Port2
Bus I Interface
Port2
Bus I Interface
Bus
I In
terfa
ceP
ort 2
Port2
Fsl_S_D
ata
Fsl_
M_D
ata
Port #3 FSM
Fsl_
S_R
ead
Fsl_
S_C
ontro
l
Fsl_
S_H
asD
ata
TO\FROM FSL
Fsl_M_W
rite
Fsl_M_C
ontrol
Fsl_M_Full
Bus II & Data Bus Interface
Port
2
Fsl_S_Data
Fsl_M_Data
Por
t #2
FSM
Fsl_S_Read
Fsl_S_Control
Fsl_S_HasData
TO\F
RO
M F
SL
Fsl_M_Write
Fsl_M_Control
Fsl_M_Full
Port2
Fsl_
S_D
ata
Fsl_M_D
ata
Port #1 FSMFsl_S
_Read
Fsl_S_C
ontrol
Fsl_S_H
asData
TO\FROM FSL
Fsl_
M_W
rite
Fsl_
M_C
ontro
l
Fsl_
M_F
ull
Por
t2
Fsl_S_Data
Fsl_M_Data
Port #4 FS
M
Fsl_S_Read
Fsl_S_Control
Fsl_S_HasData
TO\FR
OM
FSL
Fsl_M_Write
Fsl_M_Control
Fsl_M_Full
Port2
Bus
II &
Dat
a
Bus
Inte
rface
Bus II &
Data
Bus Interface
Bus II & Data Bus Interface
Dest2
Dest
2
Dest2
Des
t2
Dest2
COMM COMM
CO
MM
CO
MM
Bcast
Bca
st
Bcast
Bca
stR
eq
BcastPriority
• Two main units: Permission Unit Port FSM
• Time limited
Round Robin arbiter
• Port to Port & broadcasting
• Smart Connectivity• R – R• R - Core
• Modular design
![Page 6: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/6.jpg)
Permission process
6
• Round Robin arbiter- service order according to loop counter.
• Check if DEST is not busy. • Permit for a ‘time slot’. • If not requesting, service next
requesting port.• BUSY and LAST writing ports
are saved.• Check for messages COMM
and direct to relevant port according to table
• Broadcast priority to enable only one bcast’ at a time.
CONTROLLER
Permission Unit
Clk Rst
BUSY
TO
\FR
OM
C
on
trol B
us
2
2 Port
DE
ST 2
Port2
3 1 2 4
LAST WRITING PORT1 2 3 4
MUX 4X2
1 0 1 0
BUSY PORTS1 2 3 4
MUX 4x1
LAST
Timer & Enable
Unit
Premit
2 2
2
2 2
Req1Req2Req3Req4
Req
2
COMMs table
4 3 2 1
Dest
COMM CommDst
DEST
BcastPriority
Unit
R1
R2
R3
R4Bcast1Bcast2Bcast3Bcast4
FR
OM
P
ort F
SM
’sNxt
TimeOver
Bcast
![Page 7: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/7.jpg)
Our changes for the router
7
Fifth port
Routing table
Broadcast table
Local router (LR)Fabric router (FR)Primary/secondary interchip
router (P/S-ICR)PC router (PCR)
New router types:Changes:
![Page 8: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/8.jpg)
Fifth port
8
Cross Bar – Low Level
Clk Rst
Req
Des
t
Prem
it
Des
t
Pre
mit
Req
Dest
Premit
Req
Dest
Premit
Control B
us II
Control Bus II
Control Bus II
Permission Unit
Port
Controls3
Timer & Enable Unit
Control Bus I
Control Bus I
Data Bus 32 Bits
Data Bus 32 Bits
Data B
us
Data B
us
2
Bus I Interface Port2
Bus I Interface
Port2
Bus I Interface
Bus
I In
terfa
ceP
ort 2
Port2
Fsl_S_D
ata
Fsl_
M_D
ata
Port #3 FSM
Fsl_
S_R
ead
Fsl_
S_C
ontro
l
Fsl_
S_H
asD
ata
TO\FROM FSL
Fsl_M_W
rite
Fsl_M_C
ontrol
Fsl_M_Full
Bus II & Data Bus Interface
Port
2
Fsl_S_Data
Fsl_M_Data
Por
t #2
FSM
Fsl_S_Read
Fsl_S_Control
Fsl_S_HasData
TO\F
RO
M F
SL
Fsl_M_Write
Fsl_M_Control
Fsl_M_Full
Port2
Fsl_
S_D
ata
Fsl_M_D
ata
Port #1 FSMFsl_S
_Read
Fsl_S_C
ontrol
Fsl_S_H
asData
TO\FROM FSL
Fsl_
M_W
rite
Fsl_
M_C
ontro
l
Fsl_
M_F
ull
Por
t2
Fsl_S_Data
Fsl_M_Data
Port #4 FS
M
Fsl_S_Read
Fsl_S_Control
Fsl_S_HasData
TO\FR
OM
FSL
Fsl_M_Write
Fsl_M_Control
Fsl_M_Full
Port2
Bus
II &
Dat
a
Bus
Inte
rface
Bus II &
Data
Bus Interface
Bus II & Data Bus Interface
Dest2
Dest
2
Dest2
Des
t2
Dest2
COMM COMM
CO
MM
CO
MM
Bcast
Bca
st
Bcast
Bca
stR
eq
BcastPriority
5th Port
Just adding another port module to the ring…
![Page 9: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/9.jpg)
Routing
9
PC C C F F L LAddress
localfabricchip
rankcomm
Local router:Similar comm – routing by rank.Other comms – to 5th port.
Other routers:Routing by comm only.
Result: smaller routing tables
![Page 10: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/10.jpg)
Routing
10
CONTROLLER
Permission Unit
Clk Rst
BUSY
TO
\FR
OM
C
on
trol B
us
2
2 Port
DE
ST 2
Port2
3 1 2 4
LAST WRITING PORT1 2 3 4
MUX 4X2
1 0 1 0
BUSY PORTS1 2 3 4
MUX 4x1
LAST
Timer & Enable
Unit
Premit
2 2
2
2 2
Req1Req2Req3Req4
Req
2
COMMs table
4 3 2 1
Dest
COMM CommDst
DEST
BcastPriority
Unit
R1
R2
R3
R4Bcast1Bcast2Bcast3Bcast4
FR
OM
P
ort F
SM
’s
Nxt
TimeOver
Bcast
Non-existing components to be added.
![Page 11: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/11.jpg)
Broadcast table
11
Cross Bar – Low Level
Clk Rst
Req
Des
t
Prem
it
Des
t
Pre
mit
Req
Dest
Premit
Req
Dest
Premit
Control B
us II
Control Bus II
Control Bus II
Permission Unit
Port
Controls3
Timer & Enable Unit
Control Bus I
Control Bus I
Data Bus 32 Bits
Data Bus 32 Bits
Data B
us
Data B
us
2
Bus I Interface Port2
Bus I Interface
Port2
Bus I Interface
Bus
I In
terfa
ceP
ort 2
Port2
Fsl_S_D
ata
Fsl_
M_D
ata
Port #3 FSM
Fsl_
S_R
ead
Fsl_
S_C
ontro
l
Fsl_
S_H
asD
ata
TO\FROM FSL
Fsl_M_W
rite
Fsl_M_C
ontrol
Fsl_M_Full
Bus II & Data Bus Interface
Port
2
Fsl_S_Data
Fsl_M_Data
Por
t #2
FSM
Fsl_S_Read
Fsl_S_Control
Fsl_S_HasData
TO\F
RO
M F
SL
Fsl_M_Write
Fsl_M_Control
Fsl_M_Full
Port2
Fsl_
S_D
ata
Fsl_M_D
ata
Port #1 FSMFsl_S
_Read
Fsl_S_C
ontrol
Fsl_S_H
asData
TO\FROM FSL
Fsl_
M_W
rite
Fsl_
M_C
ontro
l
Fsl_
M_F
ull
Por
t2
Fsl_S_Data
Fsl_M_Data
Port #4 FS
M
Fsl_S_Read
Fsl_S_Control
Fsl_S_HasData
TO\FR
OM
FSL
Fsl_M_Write
Fsl_M_Control
Fsl_M_Full
Port2
Bus
II &
Dat
a
Bus
Inte
rface
Bus II &
Data
Bus Interface
Bus II & Data Bus Interface
Dest2
Dest
2
Dest2
Des
t2
Dest2
COMM COMM
CO
MM
CO
MM
Bcast
Bca
st
Bcast
Bca
stR
eq
BcastPriority
0 1 1 0 1
Broadcasting only to spanning tree branches.
Table tags branch ports with ‘1’ value:
Connected to “Port FSM” unit of each port.
![Page 12: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/12.jpg)
12
Software layers
Software design
• Application Layer: MPI functions interface
• Network Layer: hardware independent implementation of these functions
• Data layer: relies on command bit fields
• Physical layer: designed for FSL bus Network layer
Application layer
Data layer
Physical layerAdjust to conform with altera i/f.
Using DMA transfers.
Add async. functions
Adjusted for new comm size
![Page 13: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/13.jpg)
Message Passing Flow
13
Destination Tag Buffer address Size
Source Buffer
Auxiliary Receive Buffer (Constant)
Destination Buffer
Network
DMA transfer
DMA transfer
DMA transfer
MPI_Isend: only adds send request to sending list.
Destination Tag Buffer address Size
Destination Tag Buffer address Size
DMA sends data asynchronously.
Source Tag Buffer address Size
MPI_Irecv: only adds receive request to receiving list. Source Tag Buffer address Size
Source Tag Buffer address Size
DMA receives data asynchronously.
Transfer data into buffer in background.
Sending
Receiving
![Page 14: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/14.jpg)
Obstacle1 - Memory bottleneck
14
Each Nios uses ~13Kb onchip memory.
FPGA has only ~70Kb onchip memory.
Only 5 processors fit.
Solutions:o Offchip memory – slow.Reducing program footprint.Using bigger FPGA for the whole network.
![Page 15: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/15.jpg)
!!
Obstacle2 - Cache coherency
15
DMA buffer
cache line cache line cache line cache line
Cache flush is necessary but not enough! Incoherency in unaligned cache lines.
Solutions:o Not using cache – asynchronic system not effective.o Disabling cache in buffer area – cannot use cache after
DMA transfer. Align DMA buffers to cache lines (using memalign).
Memory
Cache
![Page 16: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/16.jpg)
Local router Testing
16
Localrouter
NiosII
PC
Simple FIFO*
PIO
NiosII PIO
NiosII PIO
NiosII PIO
Simple FIFO*
Simple FIFO*
Simple FIFO*
Testing Program
* PIO to FIFO connector
• PIO output debug information, data sent/received and results.
• Test program prints the PIO data on screen.• In simulation PIO can be read directly from wave.
![Page 17: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/17.jpg)
Application
17
Multiple matrix multiplication.
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
MUL MUL MUL MUL
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
0,0 0,1 0,
1,0 1,1 1,
,0 ,1 ,
n
n
m m m n
a a a
a a a
a a a
![Page 18: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/18.jpg)
![Page 19: Implementing a NoMC on the Gidel platform end-semester presentation](https://reader033.fdocuments.net/reader033/viewer/2022051401/56813524550346895d9c8c15/html5/thumbnails/19.jpg)
19
QuestionsQuestions