Transactor-based debugging of massively parallel … · Transactor-based debugging of massively...
Transcript of Transactor-based debugging of massively parallel … · Transactor-based debugging of massively...
Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jürgen Teich
Hardware/Software Co-Design
University of Erlangen-Nuremberg
Transactor-based debugging of
massively parallel processor array
architectures
1st International Workshop on
Multicore Application Debugging (MAD 2013),
November 14-15, 2013
Germany
Agenda
Slide 2
Motivation
Invasive Computing
Hardware Debugging
Transactor-based Prototyping
Conclusions
Slide 3
Motivation
Steady increase in the
application complexity
Customization and
heterogeneity are the key
success for future
performance gains
Steady increase in the
number of cores on a chip
TCPA
CPU CPU
CPU CPU
Memory
CPU
i-Core CPU
Memory
CPU
CPU CPU
CPU CPU
Memory
MemoryI/O
CPU
i-Core CPU
Memory
CPU
Memory TCPA
CPU CPU
CPU CPU
NoC
Router
NoC
RouterNoC
Router
NoC
Router
NoC
Router
NoC
Router
NoC
RouterNoC
Router
NoC
Router
Memory
• A resource-aware computing paradigm− Each application may use available computing resources in 3 phases:• Exploring and claiming them (invade)
• Configuring them for parallel computing (infect)
• Releasing them (retreat)
• Support for resource-awareness at
various levels − Application level
− Compiler level
− Run-time system level
− Architecture level
• Architecture consists of different
compute tiles− RISC CPU tiles
− RISC CPUs with reconfigurable fabrics
− Programmable accelerators (TCPA)
tiled architecture
Invasive Computing
Slide 4
Challenge: Simultaneous development of different architecture
and software parts as well as their integration and validation
AHB bus
Conf. & Com.
Proc. (LEON3)Int. Ctr.
AP
B b
us
AHB/APB
Bridge
IM GC
AG
IM
GC
AG
IMGC
AG
IM
GC
AG
Configura
tion
Manager
I/O
Buffers
I/O
Buffers
I/O Buffers
I/O Buffers
/* code to be executed sequentially*/
...
val constraints = new AND();
constraints.add(new TypeConstraint(PEType.TCPA));
constraints.add(new PEquantity(4));
constraints.add(new Layout(LIN));
val claim = Claim.invade( constraints );
val ilet = (id:IncarnationID) => {
/* code to be executed in parallel */
...
};
claim.infect( ilet );
…
claim.retreat();
Run-time
system
Invasion on TCPAs
• Run-time system interaction with TCPAs
• Resource requests and releases
• Application configuration
• Input/output data streams
How do we prototype
TCPAs with tight
software/hardware
interactions?
Slide 5
InvasIC Prototyping Platform
Slide 6
• Synopsys FPGA-based
prototyping platform
− Up to 12 million ASIC gates
of capacity
− Tools for multi-FPGA
prototypes (Certify)
and RTL debug (Identify)
− UMRBus interface kit for
host workstation
− Transactor library for AMBA
to support
bus-protocol communication
− Portable hardwareDUT
FPGA-Based Hardware
ConnectorCamera
Sensor I/F
Host
Connector
OS
Run-time
Control
Display
Driver
DVI Extension
Typical HDL-based Development
Slide 7
HDL-Simulator (ModelSim)
Testbench (VHDL)I/O
Buffers
I/O
Buffers
I/O Buffers
I/O Buffers
DUT
HDL-Bridge-based Debugging
Slide 8
HDL-Simulator (ModelSim)
Testbench (VHDL)
Software Wrapper
Hardware Wrapper
I/O
Buffers
I/O
Buffers
I/O Buffers
I/O Buffers
DUT
Synopsys Transactor Library
Slide 9
• Library offers
UMRBus-based
transactors− AMBA
− UART
− GPIO
− …
• C++ and Tcl API
• Easy to integrate into
existing RTL designs
write () ahb_master
Read ()
API CAPIM
AH
B b
us
UMRBus read/write
initiator
call back () ahb_slave
call back ()
API CAPIM read/write
initiator
UMRBus
Evaluation
Slide 10
Performance Cycle
accuracy
Signal
observability
Intended use
HDL-Simulation slowest yes high hardware
development
HDL-Bridge slow yes medium hardware
debugging
AMBA-
Transactor
high no low integration and
extended testing
• Hardware developing and debugging requires cycle
accuracy and highly flexible possibilities to observe
individual signals
• For software developing and testing, the performance is a
key feature beside observability of registers
Now, the main video based
application (Edge detection)
tries to capture the remaining
PEs on the TCPA tile, while
satisfying the following
properties:
Guaranteed constant
throughput for a
1024x768 frame resolution
Dynamic adaptation of
quality of service
(Laplace or Sobel)
Test Application
Slide 11
A secondary application
pre-occupies a number of PEs
on the target TCPA-tile
AHB bus
Conf. & Com.
Proc. (LEON3)Int. Ctr.
AP
B b
us
AHB/APB
Bridge
IM GC
AG
IM
GC
AG
IMGC
AG
IM
GC
AG
Configura
tion
Manager
I/O
Buffers
I/O
Buffers
I/O Buffers
I/O Buffers
Rx TxDVI Extension Board
Hardware/Software Interactions
Slide12
AM
BA
AH
B T
ran
sact
or
AHB bus
Conf. & Com.
Proc. (Leon3)
Int.
Ctr.
AP
B b
us
AHB/APB
Bridge
IM IM
IMIM
Configura
tion
Manager
I/O
Buffers
I/O
Buffers
I/O Buffers
I/O Buffers
Rx TxDVI Extension Board
LEON3: An Invade Request for n PEsRequest an arbitrary number of PEs for a secondary application (n)
TCPA: Invasions on invasion controllers
LEON3: Respond the invasion request (n PEs)
LEON3: An Invade Request for 25 PEs
TCPA: Invasions on invasion controllers
Request 25 PEs for the edge detection application
If (2<m<9)
Load Sobel
1x3
configuration
If (8<m<25)Load Laplace
3x3 configuration
If (m==25)Load Laplace
5x5 configuration
Receive the number of invaded PEs (m) LEON3: Respond the invasion request (m )
Send configuration stream and start computation
TCPA: Application execution
Application termination and resource release request
Experimental Setup
Slide 14
AHB Bus
LEON3
CORE: 1
LEON3
CORE: 2
LEON3
CORE: 0
LEON3
CORE: 3
static
RAM
Master
Transactor
• 1. Step
− Write data to the RAM
− measure data rate
• 2. Step
− Read data from RAM
− measure data rate
Master Transactor Data Rate
Slide 15
0,261 0,631
1,0052,466
3,487
6,666
9,138
13,388
17,724
20,798
23,174
0,331 0,744
1,562,907
4,584
6,132
7,344
8,576 8,98 9,18 9,458
0
5
10
15
20
25
128 256 512 1K 2K 4k 8K 16K 32K 64K 128K
MB
yte
s/s
ec
bytes
write read
Software Development
Slide 16
• GRMON
• General debug monitor for the LEON3 processor Read/write access to all system registers and memory
Built-in disassembler and trace buffer management
Downloading and execution of LEON applications
Breakpoint and watchpoint management
Support for USB, JTAG, RS232, PCI, and Ethernet debug links
Tcl interface (scripts, procedures, variables, loops etc.)
• Challenges
• Initial situation offered by GAISLER Bus-based MPSoC with up to 16 cores and only one GRMON
instance
• But, we need a GRMON instance to each tile
Each instance needs a separate connection medium to CHIPit
Synchronization between the tiles
GRMON Debugging
Slide 17
CPU CPU
CPU CPU
Memory
i-Core
CPU CPU
CPU CPU
Memory
i-Core
CPU CPU
CPU CPU
Memory
CPU CPU
CPU CPU
Memory
CPU CPU
CPU CPU
Memory
TCPA MemoryI/O
Memory TCPA
NoC
Router
NoC
RouterNoC
Router
NoC
Router
NoC
Router
NoC
Router
NoC
RouterNoC
Router
NoC
Router• Data transfer
I/O Tile
Direct to the tiles
• Debug
− Debug unit
− GAISLER (GRMON)
DEBUG DEBUG DEBUG
DEBUG DEBUG
DEBUG DEBUG DEBUG
DEBUG
Multiple Transactor-based Debugging
Slide 18
CPU CPU
CPU CPU
Memory
i-Core
CPU CPU
CPU CPU
Memory
i-Core
CPU CPU
CPU CPU
Memory
CPU CPU
CPU CPU
Memory
CPU CPU
CPU CPU
Memory
TCPA MemoryI/O
Memory TCPA
NoC
Router
NoC
RouterNoC
Router
NoC
Router
NoC
Router
NoC
Router
NoC
RouterNoC
Router
NoC
Router• Data transfer
I/O Tile
Direct to the tiles
• Debug
− AMBA Transactor
− GAISLER (GRMON)
Transactor Transactor Transactor
TransactorTransactorTransactor
Transactor Transactor Transactor
Conclusions
Silde 19
• HDL-Bridge-based debugging enables efficient and
precise hardware development on multiple FPGAs
• AHB transactor interface eased connectivity and control
over FPGA-based prototype
• Transactor-based debugging offers fast and scalable
hardware-software interaction of heterogeneous MPSoC
• Our FPGA-based prototyping approach is feasible for
MPSoC validation and demonstration
Thank you for your attention!
Slide 20
Transactor-based debugging of
massively parallel processor array
architectures
ContactMarkus Blocherer
Hardware/Software Co-Design
Universität Erlangen-Nürnberg
Cauerstraße 11, 91058 Erlangen, Germany
Email: [email protected]
www.invasive-computing.org