1 A methodology for design space exploration of on-chip networks Luciano Lavagno Politecnico di...
-
Upload
paulina-small -
Category
Documents
-
view
221 -
download
0
Transcript of 1 A methodology for design space exploration of on-chip networks Luciano Lavagno Politecnico di...
1
A methodology for design space exploration of on-chip networks
Luciano LavagnoPolitecnico di Torino, Italy [email protected] Berkeley Labs, CA http://polimage.polito.it/~lavagno
Laura VanzagoSTMicroelectronics [email protected]
MPSOC summer school, Chateau de Pizay, July 2002
MPSOC 2002 - 2Luciano Lavagno ©
OutlineSystem-on-chip design flow
–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation
Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration
MPSOC 2002 - 3Luciano Lavagno ©
The System-On-Chip Design FlowSpecify:
–What does the customer really want?Architect:
–What is the most cost and performance effective architecture to implement it?
–What existing components can I adapt and re-use?Evaluate:
–What is the performance impact of a cheaper architecture? Implement:
–What can I generate automatically from libraries and customization?
Idea: separate computation, communication and performance
MPSOC 2002 - 4Luciano Lavagno ©
The System-On-Chip Design Flow
CommunicationRefinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
MPSOC 2002 - 5Luciano Lavagno ©
The System-On-Chip Design Flow
ŒAnnotation
of architectural timing and energy
onto behavior
PerformanceSimulation
behavior annotated with architectural effects
ŽAnalyze / Visualize
Results
MPSOC 2002 - 6Luciano Lavagno ©
Functional Modeling
MPEG Decoder
VLD IDCT MC DISPLAYBAIZ,IQ
BAMEM
BAMEMM M M M MM
M
MPSOC 2002 - 7Luciano Lavagno ©
Communication Refinement
VLD IDCT MC DISPLAYBAIZ,IQ
BAMEM
BAMEMM M M M MM
M
VLD IDCT MC DISPLAYBAIZ,IQ
BAMEM
BAMEMM MM
M
REAS
BUS
M
SEG
M
REAS
M
M
SEG
M
SEG
M
REAS
MPSOC 2002 - 8Luciano Lavagno ©
Optimization
VLD IDCT DISPLAYBAIZ,IQ
BAMEMM MMC BA
MEMM
M
REAS
BUS
M
SEG
M
REAS
M
M
SEG
M
SEG
M
REAS
MPSOC 2002 - 9Luciano Lavagno ©
Functional modeling
CommunicationRefinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
MPSOC 2002 - 10Luciano Lavagno ©
Architectural Modeling
CommunicationRefinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
MPSOC 2002 - 11Luciano Lavagno ©
MappingCommunication
Refinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
MPSOC 2002 - 12Luciano Lavagno ©
OutlineSystem-on-chip design flow
–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation
Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration
MPSOC 2002 - 13Luciano Lavagno ©
Separate Delay ModelSeparate Delay Model
Annotated IP Functional Model
my_ip() {f = x.read();r = f * k;__DelayCycles(2);y.write(r); }
IP Functional Modelmy_ip() {f = x.read();r = f * k;y.write(r); }
Delay Script// HW implemdelay() {input(x);run();delay(2.0 / cps);output(y); }
Inline Delay ModelInline Delay Model
Functional Modelmy_ip() {f = x.read();r = f * k;y.write(r); }
Performance ModelingCommunication
Refinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
MPSOC 2002 - 14Luciano Lavagno ©
Software Performance Estimation
PerformanceEstimation
Compile
generated C andrun natively
ld ld op ld li op ts -- br
Analyse
basic blockscompute delays
Virtual MachineInstructions
ArchitectureCharacterization
ŽGenerate new C
with delay annotations
v__st_tmp = v__st;__DELAY(LI+LI+LI+LI+LI+LI+OPc);startup(proc);if (events[proc][0] & 1) {__DELAY(OPi+LD+LI+OPc+LD+OPi+OPi+IF); goto L16;}
ŒSpecify behavior
and I/O
ANSI CInput
v__st_tmp = v__st;startup(proc);if (events[proc][0] & 1) goto L16;
MPSOC 2002 - 15Luciano Lavagno ©
Communication refinement
P C
Module Interface
Bus independent Virtual Component Interface Write, Read (address, bus-able data chunk...) VCI
P C
VCI to Physical-Bus Wrapper
Physical Bus Transferse.g. Arbitrated PIBus protocol PHY
P C
Process
APPDelay Independent APIe.g. unbounded FIFO Write, Read (vector of «any» type )
HW/SW Independent System Communications e.g. Bounded FIFO
P C
Module
SYS
MPSOC 2002 - 16Luciano Lavagno ©
Communication refinement
fifowtr
dest
ITC
fifowriter
MEM
prod-ucer
chan.writer
wakeup
writer
filter
chan.reader
RTOSqueue
fiforeader
n
CPU
RTOS board support package
wrapper
hw tasksw task
Interrupt service routine
src
VCI
PHY
SYS
SYS
MPSOC 2002 - 17Luciano Lavagno ©
Communication RefinementA B
CPU Mem
RTOS Computation
mutex_lckmemcpy
signal
Post(5)
write
busRequest
arbiterReq/Release
busResponse
setEnabled
Value()
waitmemcpy
signal
read
busRequest
arbiterReq/Release
busResponse
SwMutexes
BusMaster SlaveAdapter
BusArbiter
MemoryAccess Arch. Services
SemaphoreProtectedSemProt_Send SemProt_Recv
Comm. Services
MPSOC 2002 - 20Luciano Lavagno ©
Communication abstraction layer refinement
HW SW
CommunicationRefinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
RP E -LTPE nc oder
ChannelCoder
Interleaver
M odulator
Dem odulatorDe-interleaver
ChannelDec oder
RP E -LTPDec oder
F ro m R F
T o R F
S p e e c hO ut
P rotoc olS tac k
Con tro lData
M ult ip lex er
De-m ult ip lex er
S p e e c h In
C o n tro lD a ta
P rotoc olS tac kFigure 1
D SPD S P
Interna lRA M
BUS
M ic ro p ro c e s s o r RA MD e d ic ate dHard ware
Figure 2
Figure 3
Ad
dre
ss
de
co
de
r
Dri
ve
r
HW SW
SemProt_Send
A B
CPU Mem
RTOS
mutex_lock;
memcpy;
signal
Post(5)
write
busRequest
arbiterRequest/Release
busIndication
setEnabled
Value()
wait;
memcpy;
signal
read
SemaphoreProtected
busRequest
arbiterRequest/Release
busIndication
User Visible SwMutexes
BusMaster SlaveAdapter
BusArbiter
MemoryAccess Architecture Services
SemProt_Send SemProt_Recv
Pattern Services
MPSOC 2002 - 21Luciano Lavagno ©
Performance simulation by mapping
F1 F2 F3 Function
A1 A2 A3Architecture
Mapping
F1 F2 F3Comm Comm
Arbiter
Performancesimulation
model
MPSOC 2002 - 22Luciano Lavagno ©
Performance simulation
Software Gantt Charts
Architecture Analysis
CommunicationRefinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4Flow To Implementation
MPSOC 2002 - 23Luciano Lavagno ©
Exploring Design Trade Offs
Iteration through different mapping experimentsGradual refinement of the designEvaluation
–of the "refined" design–of system performance after implementation
Export implementation to–Testbench and top-level netlist–Hardware netlist–Software RTOS customization
MPSOC 2002 - 24Luciano Lavagno ©
OutlineSystem-on-chip design flow
–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation
Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration
MPSOC 2002 - 25Luciano Lavagno ©
Implementation by mapping
F1 F2 F3
Intfc Intfc
BUS
F1 F2 F3
A1 A2 A3
Function
Architecture
Intfc Intfc
Mapping
Implementationmodel
MPSOC 2002 - 26Luciano Lavagno ©
CommunicationRefinement
Mapping
SystemBehavior
SystemArchitecture
PerformanceSimulation
BehaviorSimulation
21
3
4
Flow to Implementation
Flow To Implementation
Flow To Implementation
HardwareTop-level
SystemTest Bench
Softwareon RTOS
System ExplorationCommunication Refinement
Export refined design to co-verification and implementation tools
MPSOC 2002 - 27Luciano Lavagno ©
Flow to Implementation
Architecture
TaskX
TaskY
RTOSISRW
CPUASIC
bus
RAMROM
Behavior
A
B
C
D
E F
G
IJ
H
MPSOC 2002 - 28Luciano Lavagno ©
TaskX
TaskY
RTOS
CPU
A
B
C
D
E
Customizing RTOS$<StandardHeader,'RTOS rootialization'>$<RtosAndCpuIncludes>$<BoardSupportPackageIncludes>$<LynxSwIncludes>/* Device Driver includes/device handle decls */$<LynxDriverIncludes>$<LynxDeviceHandleDecs>/* Mutex semaphore per protected data-buffer */$<MutexVariableDefinitions>/* Define an identifier for each task */$<TaskIdDefinitions>
void root(void ) {/* Mutex semaphore per protected data-buffer */ $<CreateMutexes>/* Create each software task */ $<CreateTasks>/* Register interrupt service routines */ $<RegisterInterrupts>/* Schedule each software task. */ $<StartTasks>/* Delete or suspend the root task. */ $<DeleteSelf>}
#include <psos.h>#include "init.h"#include "tasks.h"/* Device Driver includes/device handle decls */#include "drivers.h"/* Mutex semaphore per protected data-buffer */unsigned long I_24_I_50_MainDisp_mutex;unsigned long I_24_I_50_SubDisp_mutex;/* Define an identifier for each task */unsigned long task_I_13_I_6__ready;unsigned long task_I_26__ready;
void root(void ) {/* Mutex semaphore per protected data-buffer */k_fatal(0x20000004, K_LOCAL);k_fatal(0x20000004, K_LOCAL);/* Create each software task */ if (t_create("T0", 10, 1024, 1024, T_LOCAL|,&task_I_13_I_6__ready)) k_fatal(0x20000001, if (t_create("T1", 11, 1024, 1024, T_LOCAL|,&task_I_26__ready)) k_fatal(0x20000001,…
MPSOC 2002 - 29Luciano Lavagno ©
TaskX
TaskY
RTOS
CPU
A
B
C
D
E
Creating SW Communication Code #include <psos.h> #define LYNX_BEGIN_ATOMIC() OSDisableInt() #define LYNX_END_ATOMIC() OSEnableInt() #define LYNX_SET_PENDING(taskEventName) ev_receive(allevents, \
(EV_ANY || EV_NOWAIT), 0, events_r) #define LYNX_SET_READY(taskEventName) ev_send(taskEventName, allevents) #define LYNX_MUTEX_REQUEST(mutex) sm_p(mutex, SM_WAIT, 0) #define LYNX_MUTEX_RELEASE(mutex) sm_v(mutex) #define LYNX_ISR_ENTER() OSEnterISR() #define LYNX_ISR_EXIT() OSExitISR()
void lynx_Run(lynx_inst_ident_t inst_id){ char buffinput[10] = ""; if (lynx_Enabled(inst_id,in)){ lynx_Value(inst_id, in, &buffinput); ... behaviour d functionality .... lynx_Post(inst_id, out, &buffinput);}#define I_31_I_64_Value_MainDisp(inst_id, buff_p) \
( ( \ (LYNX_MUTEX_REQUEST(I_3_DM_1_X_mutex)), \ (LYNX_MEMCPY(buff_p,&I_3_DM_1_X,sizeof(I_3_DM_1_X))), \ (Probe_I_31_I_64_Value_MainDisp), \ (LYNX_MUTEX_RELEASE(I_3_DM_1_X_mutex)) ), &I_3_DM_1_X\)
MPSOC 2002 - 30Luciano Lavagno ©
Creating HW Communication CodeD G
I
J
H
TaskY
RTOSISRW
CPU
ASIC
bus
InterruptRegisterMapped
Data bus (16)
ASICCPUAddressr bus
DBus Intface
64
ISR
Interrupt bus
= data buffer
= presence bit
= memcpy (64 bit read)
= frozen data buffer
= LYNX
InterruptRegisterMapped16
G
I
H8
32
InterruptRegisterMapped16
J
64
MPSOC 2002 - 31Luciano Lavagno ©
Creating Testbench
A
C B D
Test1
Test2
System-level Simulation
Results DB
Results DB
A
C B D
Co-Verification
Source
Source
MPSOC 2002 - 33Luciano Lavagno ©
OutlineSystem-on-chip design flow
–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation
Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration
MPSOC 2002 - 34Luciano Lavagno ©
Case study: wireless LAN physical layer
OFDM Physical Layer/Digital BB
MAC
OFDM TX OFDM RXDynamicReconfiguration
Network
Application
HiperLan/2
PicoRadio
Protocol Stack
MultiMedia WirelessNetworks;High Rate: 10 Mb/secLow Power: 10-100 mW
Ad Hoc Networks:Low Rate: b/sec - kb/secLow Power: 100W
MPSOC 2002 - 36Luciano Lavagno ©Implementation
Design Flow Application Specification
Algorithm Exploration
Functional Simulation and Refinement
Architecture Exploration:Performance Simulation
Architecture Refinement
COSSAP/C (Matlab/Simulink, …)
English (UML, …)
VCC (SystemStudio, …)
VCC (SystemStudio, …)
TX
OFDM RXOFDM TX
RX
OFDM Physical Layer
MAC, Network and Higher Layers
Functional IP Reuse
Mapping
C
Mapping
Functional Partitioning
MPSOC 2002 - 40Luciano Lavagno ©
Heterogeneous BehaviorMAC
GoT GoR
Preamble
DataPath
TX – Static Dataflow RX – Dynamic Dataflow
Sym
DataPath
Sync
CostToRCostToT
RXTX
IdleState
GoT GoR
GoT/GoR
Control-FSM
N N
MPSOC 2002 - 41Luciano Lavagno ©
void CPP_MODEL_IMPLEMENTATION::Init() { ….; Length = LenghtPar.Value(); // read parameter // Set data rate on 2 input ports: Real and Imag Real.SetDataRate(Length); Imag.SetDataRate(Length);}// Run() is executed every time the firing rule is satisfiedvoid CPP_MODEL_IMPLEMENTATION::Run() { for (i=0; i<Real.GetDataRate(); i++) {
// Read data from the input ports data[i] [0] = Real.Value(); data[i] [1] = Imag.Value(); } // Call the FFT procedure (C functional model) fft_cns_rot_bfp(data,….); // Write data to two output ports (OutReal, OutImag) for( i=0; i< Lenght; i++) {
OutReal.Post(data[i][0]);OutImag.Post(data[i][1]);}}
}}
Example of functional block
FFTReal
Imag
OutReal
OutImag
LenghtPar = 64
Imported from Cossap environment
64
64
MPSOC 2002 - 42Luciano Lavagno ©
OutlineSystem-on-chip design flow
–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation
Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration
MPSOC 2002 - 43Luciano Lavagno ©
Wireless LAN physical layer SOC architecture
FPGA
FFT
FIR
UART
BUFFER
FPGA config. mem.
Int. bridge
Micro -
Clock
gen.
SPS2
(instruction/
data RAM)
XBARInterface
Processor busInterface
XBAR
Processor bus
JtagInterface
DPR2/SPS2
Bridge
TEST(0..2)
Ck, reset
CK2 CK1 MCK VDD VSS Reset
I/D caches
Datapath
MPSOC 2002 - 44Luciano Lavagno ©
Crossbar featuresThe crossbar model is flexible in the number of supported masters and slaves (evaluated at simulation initialization time)
A prioritized FIFO is used to arbitrate multiple master requests for each slave
Number of parallel slave accesses defined through a parameter
A transmission can be suspended by higher priority requests (preemptive)
Arbitration overhead and slave access delays are parameterized
MPSOC 2002 - 45Luciano Lavagno ©
Crossbar Architecture Service Structure
Beh1 Beh2Post() Value() Behavior
Network
XBAR
FPGA MEM FFTArchitecture Network
DFSHAREDMEMORY (HW->HW)
Communication Pattern
XBarArbiter
Sender Receiver
XBarMaster
XBarSlave
Memory
XBarMasterFPGA PORT
MEM
MEM PORT
FFT PORT
BUS
Service Stack
Mem0Mem1
MPSOC 2002 - 46Luciano Lavagno ©
Crossbar Architecture Service Structure
Beh1 Beh2Post() Value() Behavior
Network
XBAR
FPGA MEM FFTArchitecture Network
DFREGISTERDIRECT (HW->HW)
Communication Pattern
XBarArbiter
Sender Receiver
XBarMaster XBarMasterFPGA PORT FFT PORT
BUS
Service Stack
MPSOC 2002 - 48Luciano Lavagno ©
OutlineSystem-on-chip design flow
–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation
Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration
MPSOC 2002 - 49Luciano Lavagno ©
Design Space Exploration
Explored several computation/communication architecture configurations
–FFT throughput (1 item / 4 clock cycles vs 1 item / 1 clock cycle)
–Number of buffer ports FFT FIR on crossbar–FPGA FFT communication pattern
–Shared Memory–Register Direct
MPSOC 2002 - 50Luciano Lavagno ©
Exploration Results
0
500
1000
1500
2000
2500
3000
3500
4000
SimA1 SimB1 SimC1 SimD1 SimA2 SimB2 SimC2 SimD2
FPGA FFT FIR
FFT:1/4 FFT:1/1
RD SM RDSM
5.8 7.2 8.2 8.2 9.6 13.7 12.5 15.6BitRate (Mb/s) 12
1P 2P 1P 2P 1P 2P 1P 2P
Hiperlan/2spec.
MPSOC 2002 - 51Luciano Lavagno ©
Conclusion
System-On-Chip Design requires methodology, tools and libraries
Separate computation, communication and architecture–computation: compiled and scheduled–communication: refined via patterns
Map computation and communication onto platform–simulate performance–generate implementation model for HW, SW and
communication