Speeding Up Robot Control Software Through Seamless...
Transcript of Speeding Up Robot Control Software Through Seamless...
Speeding Up Robot Control Software Through Seamless Integration With
FPGA
Xuan Sang LE1,2, Luc Fabresse1, Jannik Laval3, Jean-Christophe Le Lann2, Loic Lagadec2 and Noury Bouraqadi1
1Mines Douai-DIA, Univ. Lille
2ENSTA Bretagne
3DISP Laboratory, University Lumière Lyon 2
Contents
1. Context & Motivation
2. Platform for Software/FPGAs Integration
3. Applications
1. Robot Follower Using Camera for Object Tracking
2. Reconfigurable IP-Based Smart Sensor Network
4. Conclusion
2
Traditional Robotic Computing System
1
•Accessibility
•Simplicity
•Productivity
Context
3
Sensors
Actuators
sensing
controlProcessor
Robotic Middleware
Robotic Application
Pros.
vs
Cons.•Optimization opportunities
•Performance
•Energy Requirement
Processing + Planification
FPGAs as Accelerators for Robotic Real Time System
1 Motivation
4
Sensors
Actuators
sensing
control Processor
Robotic Middleware
Robotic Application
Post-processing + Planification
FPGAs
Pre-processing
•FPGAs
•Acceleration
•Hardware Flexibility
•Processor
•Flexible Software Environment
Benefits
FPGAs as Accelerators for Robotic Real Time System
1 Motivation
4
Sensors
Actuators
sensing
control Processor
Robotic Middleware
Robotic Application
Post-processing + Planification
FPGAs
Pre-processing
•FPGAs
•Acceleration
•Hardware Flexibility
•Processor
•Flexible Software Environment
Benefits Challenges• Require a specific knowledge → loss of productivity
• FPGAs and processor interfacing varies from project to project → development time consuming
Our Approach
1 Motivation
5
A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System
Our Approach
1 Motivation
5
A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System
physical linkFPGA Processor
(PC)
Robotic Middleware
(ROS)
Existing robotic system
1
Our Approach
1 Motivation
5
A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System
physical linkFPGA Processor
(PC)
Robotic Middleware
(ROS)
Existing robotic system
1
FPG
A
Proc
esso
r
Single Board Computer/ SOC
SW Network2
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
Generate
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment Generate
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment GenerateGenerate
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment GenerateGenerate
Robotic Middleware
1
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment GenerateGenerate
Robotic Middleware
1
Network
2Robotic
Middleware
Architecture General
2 Platform for Software/FPGA Integration
7
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Architecture General
2 Platform for Software/FPGA Integration
7
•Meta-model
•Capture a subset of synthesizable VHDL
•Oriented-Object circuit description
•Circuit Model refactoring
•Implemented in Pharo/Smalltalk
VHDL
MM
M M
MM
M’
Parsing+
Model building
ModelRefacto-
rying
M’
MM
VHDL
Exporting
Bistream Synthesis VHDL
IntegrationHW Debug etc.
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
FPGA Circuits & Interface
2 Platform for Software/FPGA Integration
8
•Automatic interface generation
•Address/data IO interface supports memory mapping mechanism of SW
•Wishbone is used for the interface
•Each user circuit connects to a slave
•Circuit’s registers can be accessed via a virtual address (provided by SW)
•IRQ support software handlers
•Automatically circuit registers addressing
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Software API
2 Platform for Software/FPGA Integration
9
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Software API
2 Platform for Software/FPGA Integration
9
• System Driver Layer
• User space IO Driver
• Users circuits are mapped to a virtual memory region
• Physical interface specific
• Compliant with Wishbone interface on FPGA
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Software API
2 Platform for Software/FPGA Integration
9
• System Driver Layer
• User space IO Driver
• Users circuits are mapped to a virtual memory region
• Physical interface specific
• Compliant with Wishbone interface on FPGA
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
• API Layer
• FPGA circuits registers are accessed via virtual memory address
• Smalltalk VM supports memory mapping at primitives level
• Wrapper classes consider circuits as Smalltalk Object
• IRQ handler support
• Support high level robotic middleware (ROS, REST, etc)
Controllability and Debugging
2 Platform for Software/FPGA Integration
10
• Hardware Breakpoint
• A breakpoint controller is injected automatically
• Clock-gating technique is used to control the user circuit
• Breakpoint condition can be set from software
• Execution is resumable
• Draw back
• Vendor specific (Digital Clock Manager)
outputsinputs
Shared bus
Breakpoint controller (WB. slave)
User logic
clock controller
clock countercomparator
op. 1 operator op.2
break
global clock
resume
clk
clock count
active
Debug sub-circuit
IRQ Manager
irq
start
cnt := HWCounter new.cnt input:100.cnt setBreakpointOn:#output forValue:50 condition:#=.cnt start:true.cnt waitForIRQ:[
cnt bpActive ifTrue:[('Stop at: ', cnt output asString) print.('Steps: ', cnt clockCount asString) print.cnt resume:true.
] ifFalse:['Execution done' print.
]] timeOut:10 milliseconds.
Impact of Different Software Layers to the Performance of the Link
2 Platform for Software/FPGA Integration
11
• Experiments : Read/Write to a Single Clock Block Ram. 3 scenarios:
• Ideal, without our platform
• With the generated interface + our low level software API
• With the generated interface + our high level software API (Smalltalk)
• Device: APF 51
• FPGA : Xilinx Spartan 6 (LX9)
• Processor : Freescale iMX515 (Cortex A8 @ 800 Mhz)
• Physical Interface: 16 bit WEIM (Wireless External Interface Module)
Table 1
10 times Ideal Interface + low-level SW
Interface + Smalltalk
Write 20MB (s) 2,574 7,367 13,09
read 20MB (s) 3,993 6,343 10,1
Read speed (MB/s) 5,00876533934385 3,15308213778969 1,98019801980198
write speed (MB/s) 7,77000777000777 2,71480928464775 1,52788388082506
MB/
s
0
2
4
6
8
Ideal Interface + low-level SW Interface + Smalltalk
1,528
2,715
7,77
1,98
3,153
5,009
Read speed (MB/s) write speed (MB/s)
�1
Important amount of development time
Manual direct management of
data (addressing, data conversion)
All accesses performed via the wrapper classes
Robot Follower Using Camera for Object Tracking
3 Applications
12
Meta-model
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
Interface
+
Circuit
SW API + user Application
Object detection by color pattern
ROS Network Communicate
detected object position
(10 lines of code)
1
2 3
APF51OV7670
Robot Follower Using Camera for Object Tracking
3 Applications
13
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
• Problem with VGA images
• The original design work only with images QQVGA (160x120)
• Simulation & Debug (VGA)
• Simulation of HSV Filter Unit → 14 cycles/ pixel
• Capture Logic Unit, hard to simulate → Hardware Breakpoint
• Stop the circuit when a line (640 pixels) is captured
• 2556 cycles/ 640 pixels →4 cycles/pixel
Original design
Optimised design
Robot Follower Using Camera for Object Tracking
3 Applications
13
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
• Problem with VGA images
• The original design work only with images QQVGA (160x120)
• Simulation & Debug (VGA)
• Simulation of HSV Filter Unit → 14 cycles/ pixel
• Capture Logic Unit, hard to simulate → Hardware Breakpoint
• Stop the circuit when a line (640 pixels) is captured
• 2556 cycles/ 640 pixels →4 cycles/pixel
The filter takes more time to process a pixel than the time needed to produce
a pixel
Original design
Optimised design
Robot Follower Using Camera for Object Tracking
3 Applications
13
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
• Problem with VGA images
• The original design work only with images QQVGA (160x120)
• Simulation & Debug (VGA)
• Simulation of HSV Filter Unit → 14 cycles/ pixel
• Capture Logic Unit, hard to simulate → Hardware Breakpoint
• Stop the circuit when a line (640 pixels) is captured
• 2556 cycles/ 640 pixels →4 cycles/pixel
The filter takes more time to process a pixel than the time needed to produce
a pixel
OV7670Camera
CaptureLogic
HSV Filter
Center of mass
8 bits
hrefvsync
pclk
addr.
16 bits
4 pixelsCollector
ready
pixel 1
HSV Filterpixel 2
HSV Filterpixel 3
HSV Filterpixel 4
ready
address
Block RAM32k x 16bit
addr.
data
16 bits collector
addr
. data
address
4 bits
ready
ready
Frame buffer
x y frame_ok
I2C
Registers data
sioc siod
controller
rez160x120
rez320x240
Original design
Optimised design
Robot Follower Using Camera for Object Tracking
3 Applications
14
Hz
0
15
30
45
60
Window
47 146 195 245 295 345 395 443 493 543 593 644 694 744 794 843 893
Average
Frequency(hz)
Publishing rate of the ROS topic
Reconfigurable IP-Based Smart Sensor Network
3 Applications
15
FPGA
sensor 1 sensor n
Gateware
ARM
HTTP server
Driver/memory mapping IP-Stack
Smalltalk VM RESTFul API
…
Physical Interface
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
Wishbonewrapper
shared bus
Wishboneslave
Wishboneslave
sensor logic
sensorlogic
<<generate>>
…
<<generate>>
Gateware
Wrapperclasses
RESTFul API
Distributed Object APISoftwarepartition
SW
IP-Based Network
User application
Software
<<use>>
Execute
Base station
Sensor node
CameraUnitWrapper » x<remote >ˆself int32At:24
CameraUnitWrapper » y<remote >ˆself int32At:28
CameraUnitWrapper » position<remote >ˆ { self x. self y}
CameraUnitWrapper » positionDo:aBlock<remote:#aBlock>500 timesRepeat:[
aBlock value: self position.100 milliseconds wait]
CameraUnitWrapper » streamself positionDo:[:p| p print]
• IP based Sensor Network:
• Integration of FPGA to the nodes
• Compliant with IoT Standard
• Distributed Objected programming on sensor nodes using a dynamic language
• Over-the-air reconfiguration of HW/SW on the nodes
ContextIP-based Middleware
Sensor Network
Flexible software env.Ease of reconfiguration
Hardware flexibilityPerformance/energy
IP
IP IP
FPGA
ChallengesComplex hybrid software/hardware :• Development/ reconfiguration ?• Deployement ?
Application specific Generic or automatically generated Physical interface specific/partial reusable
Automatic software reconfigurationAutomatic gateware reconfiguration
Execution flow
Reconfigurable IP-Based Smart Sensor Network
3 Applications
16
Proof of concept of the hybrid system :• Base station: Implemented using Pharo Smalltalk• Sensor node:
• ARM: Httpd + Smalltalk VM + REST API• FPGA + camera: object tracking using a color pattern
• The base station fetches the object position from the node
64080REST+VM
544HTTPDModule
532
Shared MemoryRSS¹
¹Residence Set Size
S.W Memory footprint on a node (KB)
0.5streaming
5.3
GW. reconfig.0.5
26.20.5
4.7% CPU0.5% Memory
Resource
0
SW. reconfigIdleResource used on a sensor node at different stages
t t+5 t+10 t+15 t+20 t+25 t+30 t+35 t+40 t+45 t+50 t+55 t+60 t+65
t t+5 t+10 t+15 t+20 t+25 t+30 t+35 t+40 t+45 t+50 t+55 t+60 t+65
(s)
(s)
Network load² of streamming 500 data samples
Network load²
of the software/gateware
reconfiguration
t t+5 (s)
t t+5 (s)
4❖ A platform for interfacing FPGA to High level robotic
software
❖ Easy integration of legacy circuit design using an auto-generation code approach
❖ Debugging using hardware breakpoint
❖ Use cases
❖ Futur works
๏ Improvement of the meta-model
๏ Support more robotic middleware in our software API
Conclusion
17
FPGA vs Processor
4 Annexe
18
0
8,5
17
25,5
34
FPGA (VGA) processor(QQVGA)
Processing time per frame (ms)
0
0,4
0,8
1,2
1,6
FPGA (VGA) processor(QQVGA)
power (W)
• Device: APF 51
• Scenario 1:
• FPGA used to acquire image
• The object detection algorithm is implemented in C (on Processor)
• Image size QQVGA
• Scenario 2:
• The object detection algorithm is implemented on FPGA
• FPGA communicates object position with processor
• Image size VGA
Validity of the VHDL Parser
4 Annexe
19
• The VHDL Parser is tested again some set of VHDL Benchmark
• The ANTLR set: standard VHDL Package
• IWLS 2005 Benchmarks
• ITC’99 Benchmarks
• Gaisler Research Benchmarks