Speeding Up Robot Control Software Through Seamless...

32
Speeding Up Robot Control Software Through Seamless Integration With FPGA Xuan Sang LE 1,2 , Luc Fabresse 1 , Jannik Laval 3 , Jean-Christophe Le Lann 2 , Loic Lagadec 2 and Noury Bouraqadi 1 1 Mines Douai-DIA, Univ. Lille 2 ENSTA Bretagne 3 DISP Laboratory, University Lumière Lyon 2

Transcript of Speeding Up Robot Control Software Through Seamless...

Page 1: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Speeding Up Robot Control Software Through Seamless Integration With

FPGA

Xuan Sang LE1,2, Luc Fabresse1, Jannik Laval3, Jean-Christophe Le Lann2, Loic Lagadec2 and Noury Bouraqadi1

1Mines Douai-DIA, Univ. Lille

2ENSTA Bretagne

3DISP Laboratory, University Lumière Lyon 2

Page 2: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Contents

1. Context & Motivation

2. Platform for Software/FPGAs Integration

3. Applications

1. Robot Follower Using Camera for Object Tracking

2. Reconfigurable IP-Based Smart Sensor Network

4. Conclusion

2

Page 3: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Traditional Robotic Computing System

1

•Accessibility

•Simplicity

•Productivity

Context

3

Sensors

Actuators

sensing

controlProcessor

Robotic Middleware

Robotic Application

Pros.

vs

Cons.•Optimization opportunities

•Performance

•Energy Requirement

Processing + Planification

Page 4: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

FPGAs as Accelerators for Robotic Real Time System

1 Motivation

4

Sensors

Actuators

sensing

control Processor

Robotic Middleware

Robotic Application

Post-processing + Planification

FPGAs

Pre-processing

•FPGAs

•Acceleration

•Hardware Flexibility

•Processor

•Flexible Software Environment

Benefits

Page 5: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

FPGAs as Accelerators for Robotic Real Time System

1 Motivation

4

Sensors

Actuators

sensing

control Processor

Robotic Middleware

Robotic Application

Post-processing + Planification

FPGAs

Pre-processing

•FPGAs

•Acceleration

•Hardware Flexibility

•Processor

•Flexible Software Environment

Benefits Challenges• Require a specific knowledge → loss of productivity

• FPGAs and processor interfacing varies from project to project → development time consuming

Page 6: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Our Approach

1 Motivation

5

A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System

Page 7: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Our Approach

1 Motivation

5

A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System

physical linkFPGA Processor

(PC)

Robotic Middleware

(ROS)

Existing robotic system

1

Page 8: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Our Approach

1 Motivation

5

A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System

physical linkFPGA Processor

(PC)

Robotic Middleware

(ROS)

Existing robotic system

1

FPG

A

Proc

esso

r

Single Board Computer/ SOC

SW Network2

Page 9: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Scenario

2 Platform for Software/FPGA Integration

6

FPGA

VHDL Legacy

Signal Processing, Image Processing

Processor SW API

Page 10: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Scenario

2 Platform for Software/FPGA Integration

6

FPGA

VHDL Legacy

Signal Processing, Image Processing

Processor SW API

Generate

Page 11: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Scenario

2 Platform for Software/FPGA Integration

6

FPGA

VHDL Legacy

Signal Processing, Image Processing

Processor SW API

deployment Generate

Page 12: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Scenario

2 Platform for Software/FPGA Integration

6

FPGA

VHDL Legacy

Signal Processing, Image Processing

Processor SW API

deployment GenerateGenerate

Page 13: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Scenario

2 Platform for Software/FPGA Integration

6

FPGA

VHDL Legacy

Signal Processing, Image Processing

Processor SW API

deployment GenerateGenerate

Robotic Middleware

1

Page 14: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Scenario

2 Platform for Software/FPGA Integration

6

FPGA

VHDL Legacy

Signal Processing, Image Processing

Processor SW API

deployment GenerateGenerate

Robotic Middleware

1

Network

2Robotic

Middleware

Page 15: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Architecture General

2 Platform for Software/FPGA Integration

7

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

IRQmanager

Wishbonewrapper Physical

interface

shared bus

Wishboneslave

Wishboneslave

User Logic 1

User Logic n

irq

<<generate>>

Drivers/memory

map device file…

.

Dynamic language

VMRobotic

API

Wrapperclasses

Userapp.

<<generate>>

Non-reusable Partially reusable completely reusable automatically generated

FPGA Processor

Page 16: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Architecture General

2 Platform for Software/FPGA Integration

7

•Meta-model

•Capture a subset of synthesizable VHDL

•Oriented-Object circuit description

•Circuit Model refactoring

•Implemented in Pharo/Smalltalk

VHDL

MM

M M

MM

M’

Parsing+

Model building

ModelRefacto-

rying

M’

MM

VHDL

Exporting

Bistream Synthesis VHDL

IntegrationHW Debug etc.

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

IRQmanager

Wishbonewrapper Physical

interface

shared bus

Wishboneslave

Wishboneslave

User Logic 1

User Logic n

irq

<<generate>>

Drivers/memory

map device file…

.

Dynamic language

VMRobotic

API

Wrapperclasses

Userapp.

<<generate>>

Non-reusable Partially reusable completely reusable automatically generated

FPGA Processor

Page 17: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

FPGA Circuits & Interface

2 Platform for Software/FPGA Integration

8

•Automatic interface generation

•Address/data IO interface supports memory mapping mechanism of SW

•Wishbone is used for the interface

•Each user circuit connects to a slave

•Circuit’s registers can be accessed via a virtual address (provided by SW)

•IRQ support software handlers

•Automatically circuit registers addressing

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

IRQmanager

Wishbonewrapper Physical

interface

shared bus

Wishboneslave

Wishboneslave

User Logic 1

User Logic n

irq

<<generate>>

Drivers/memory

map device file…

.

Dynamic language

VMRobotic

API

Wrapperclasses

Userapp.

<<generate>>

Non-reusable Partially reusable completely reusable automatically generated

FPGA Processor

Page 18: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Software API

2 Platform for Software/FPGA Integration

9

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

IRQmanager

Wishbonewrapper Physical

interface

shared bus

Wishboneslave

Wishboneslave

User Logic 1

User Logic n

irq

<<generate>>

Drivers/memory

map device file…

.

Dynamic language

VMRobotic

API

Wrapperclasses

Userapp.

<<generate>>

Non-reusable Partially reusable completely reusable automatically generated

FPGA Processor

Page 19: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Software API

2 Platform for Software/FPGA Integration

9

• System Driver Layer

• User space IO Driver

• Users circuits are mapped to a virtual memory region

• Physical interface specific

• Compliant with Wishbone interface on FPGA

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

IRQmanager

Wishbonewrapper Physical

interface

shared bus

Wishboneslave

Wishboneslave

User Logic 1

User Logic n

irq

<<generate>>

Drivers/memory

map device file…

.

Dynamic language

VMRobotic

API

Wrapperclasses

Userapp.

<<generate>>

Non-reusable Partially reusable completely reusable automatically generated

FPGA Processor

Page 20: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Software API

2 Platform for Software/FPGA Integration

9

• System Driver Layer

• User space IO Driver

• Users circuits are mapped to a virtual memory region

• Physical interface specific

• Compliant with Wishbone interface on FPGA

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

IRQmanager

Wishbonewrapper Physical

interface

shared bus

Wishboneslave

Wishboneslave

User Logic 1

User Logic n

irq

<<generate>>

Drivers/memory

map device file…

.

Dynamic language

VMRobotic

API

Wrapperclasses

Userapp.

<<generate>>

Non-reusable Partially reusable completely reusable automatically generated

FPGA Processor

• API Layer

• FPGA circuits registers are accessed via virtual memory address

• Smalltalk VM supports memory mapping at primitives level

• Wrapper classes consider circuits as Smalltalk Object

• IRQ handler support

• Support high level robotic middleware (ROS, REST, etc)

Page 21: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Controllability and Debugging

2 Platform for Software/FPGA Integration

10

• Hardware Breakpoint

• A breakpoint controller is injected automatically

• Clock-gating technique is used to control the user circuit

• Breakpoint condition can be set from software

• Execution is resumable

• Draw back

• Vendor specific (Digital Clock Manager)

outputsinputs

Shared bus

Breakpoint controller (WB. slave)

User logic

clock controller

clock countercomparator

op. 1 operator op.2

break

global clock

resume

clk

clock count

active

Debug sub-circuit

IRQ Manager

irq

start

cnt := HWCounter new.cnt input:100.cnt setBreakpointOn:#output forValue:50 condition:#=.cnt start:true.cnt waitForIRQ:[

cnt bpActive ifTrue:[('Stop at: ', cnt output asString) print.('Steps: ', cnt clockCount asString) print.cnt resume:true.

] ifFalse:['Execution done' print.

]] timeOut:10 milliseconds.

Page 22: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Impact of Different Software Layers to the Performance of the Link

2 Platform for Software/FPGA Integration

11

• Experiments : Read/Write to a Single Clock Block Ram. 3 scenarios:

• Ideal, without our platform

• With the generated interface + our low level software API

• With the generated interface + our high level software API (Smalltalk)

• Device: APF 51

• FPGA : Xilinx Spartan 6 (LX9)

• Processor : Freescale iMX515 (Cortex A8 @ 800 Mhz)

• Physical Interface: 16 bit WEIM (Wireless External Interface Module)

Table 1

10 times Ideal Interface + low-level SW

Interface + Smalltalk

Write 20MB (s) 2,574 7,367 13,09

read 20MB (s) 3,993 6,343 10,1

Read speed (MB/s) 5,00876533934385 3,15308213778969 1,98019801980198

write speed (MB/s) 7,77000777000777 2,71480928464775 1,52788388082506

MB/

s

0

2

4

6

8

Ideal Interface + low-level SW Interface + Smalltalk

1,528

2,715

7,77

1,98

3,153

5,009

Read speed (MB/s) write speed (MB/s)

�1

Important amount of development time

Manual direct management of

data (addressing, data conversion)

All accesses performed via the wrapper classes

Page 23: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Robot Follower Using Camera for Object Tracking

3 Applications

12

Meta-model

address

OV7670Camera

CaptureLogic HSV Filter

Center of mass

Block RAM32k x 16bit

I2C

Registers data

siocsiod

8 bits

hrefvsync

pclk

ready

address

16 bits pixel

ready pixon

rez160x120

rez320x240

x ycontroller

addr.

data

frame_ok

16 bits collector

addr

. data

Frame buffer

Interface

+

Circuit

SW API + user Application

Object detection by color pattern

ROS Network Communicate

detected object position

(10 lines of code)

1

2 3

APF51OV7670

Page 24: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Robot Follower Using Camera for Object Tracking

3 Applications

13

address

OV7670Camera

CaptureLogic HSV Filter

Center of mass

Block RAM32k x 16bit

I2C

Registers data

siocsiod

8 bits

hrefvsync

pclk

ready

address

16 bits pixel

ready pixon

rez160x120

rez320x240

x ycontroller

addr.

data

frame_ok

16 bits collector

addr

. data

Frame buffer

• Problem with VGA images

• The original design work only with images QQVGA (160x120)

• Simulation & Debug (VGA)

• Simulation of HSV Filter Unit → 14 cycles/ pixel

• Capture Logic Unit, hard to simulate → Hardware Breakpoint

• Stop the circuit when a line (640 pixels) is captured

• 2556 cycles/ 640 pixels →4 cycles/pixel

Original design

Optimised design

Page 25: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Robot Follower Using Camera for Object Tracking

3 Applications

13

address

OV7670Camera

CaptureLogic HSV Filter

Center of mass

Block RAM32k x 16bit

I2C

Registers data

siocsiod

8 bits

hrefvsync

pclk

ready

address

16 bits pixel

ready pixon

rez160x120

rez320x240

x ycontroller

addr.

data

frame_ok

16 bits collector

addr

. data

Frame buffer

• Problem with VGA images

• The original design work only with images QQVGA (160x120)

• Simulation & Debug (VGA)

• Simulation of HSV Filter Unit → 14 cycles/ pixel

• Capture Logic Unit, hard to simulate → Hardware Breakpoint

• Stop the circuit when a line (640 pixels) is captured

• 2556 cycles/ 640 pixels →4 cycles/pixel

The filter takes more time to process a pixel than the time needed to produce

a pixel

Original design

Optimised design

Page 26: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Robot Follower Using Camera for Object Tracking

3 Applications

13

address

OV7670Camera

CaptureLogic HSV Filter

Center of mass

Block RAM32k x 16bit

I2C

Registers data

siocsiod

8 bits

hrefvsync

pclk

ready

address

16 bits pixel

ready pixon

rez160x120

rez320x240

x ycontroller

addr.

data

frame_ok

16 bits collector

addr

. data

Frame buffer

• Problem with VGA images

• The original design work only with images QQVGA (160x120)

• Simulation & Debug (VGA)

• Simulation of HSV Filter Unit → 14 cycles/ pixel

• Capture Logic Unit, hard to simulate → Hardware Breakpoint

• Stop the circuit when a line (640 pixels) is captured

• 2556 cycles/ 640 pixels →4 cycles/pixel

The filter takes more time to process a pixel than the time needed to produce

a pixel

OV7670Camera

CaptureLogic

HSV Filter

Center of mass

8 bits

hrefvsync

pclk

addr.

16 bits

4 pixelsCollector

ready

pixel 1

HSV Filterpixel 2

HSV Filterpixel 3

HSV Filterpixel 4

ready

address

Block RAM32k x 16bit

addr.

data

16 bits collector

addr

. data

address

4 bits

ready

ready

Frame buffer

x y frame_ok

I2C

Registers data

sioc siod

controller

rez160x120

rez320x240

Original design

Optimised design

Page 27: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Robot Follower Using Camera for Object Tracking

3 Applications

14

Hz

0

15

30

45

60

Window

47 146 195 245 295 345 395 443 493 543 593 644 694 744 794 843 893

Average

Frequency(hz)

Publishing rate of the ROS topic

Page 28: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Reconfigurable IP-Based Smart Sensor Network

3 Applications

15

FPGA

sensor 1 sensor n

Gateware

ARM

HTTP server

Driver/memory mapping IP-Stack

Smalltalk VM RESTFul API

Physical Interface

Meta-model

VHDLlegacy

User logics

VHDL Parser

Wishbonemaster

Wishbonewrapper

shared bus

Wishboneslave

Wishboneslave

sensor logic

sensorlogic

<<generate>>

<<generate>>

Gateware

Wrapperclasses

RESTFul API

Distributed Object APISoftwarepartition

SW

IP-Based Network

User application

Software

<<use>>

Execute

Base station

Sensor node

CameraUnitWrapper » x<remote >ˆself int32At:24

CameraUnitWrapper » y<remote >ˆself int32At:28

CameraUnitWrapper » position<remote >ˆ { self x. self y}

CameraUnitWrapper » positionDo:aBlock<remote:#aBlock>500 timesRepeat:[

aBlock value: self position.100 milliseconds wait]

CameraUnitWrapper » streamself positionDo:[:p| p print]

• IP based Sensor Network:

• Integration of FPGA to the nodes

• Compliant with IoT Standard

• Distributed Objected programming on sensor nodes using a dynamic language

• Over-the-air reconfiguration of HW/SW on the nodes

ContextIP-based Middleware

Sensor Network

Flexible software env.Ease of reconfiguration

Hardware flexibilityPerformance/energy

IP

IP IP

FPGA

ChallengesComplex hybrid software/hardware :• Development/ reconfiguration ?• Deployement ?

Application specific Generic or automatically generated Physical interface specific/partial reusable

Automatic software reconfigurationAutomatic gateware reconfiguration

Execution flow

Page 29: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Reconfigurable IP-Based Smart Sensor Network

3 Applications

16

Proof of concept of the hybrid system :• Base station: Implemented using Pharo Smalltalk• Sensor node:

• ARM: Httpd + Smalltalk VM + REST API• FPGA + camera: object tracking using a color pattern

• The base station fetches the object position from the node

64080REST+VM

544HTTPDModule

532

Shared MemoryRSS¹

¹Residence Set Size

S.W Memory footprint on a node (KB)

0.5streaming

5.3

GW. reconfig.0.5

26.20.5

4.7% CPU0.5% Memory

Resource

0

SW. reconfigIdleResource used on a sensor node at different stages

t t+5 t+10 t+15 t+20 t+25 t+30 t+35 t+40 t+45 t+50 t+55 t+60 t+65

t t+5 t+10 t+15 t+20 t+25 t+30 t+35 t+40 t+45 t+50 t+55 t+60 t+65

(s)

(s)

Network load² of streamming 500 data samples

Network load²

of the software/gateware

reconfiguration

t t+5 (s)

t t+5 (s)

Page 30: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

4❖ A platform for interfacing FPGA to High level robotic

software

❖ Easy integration of legacy circuit design using an auto-generation code approach

❖ Debugging using hardware breakpoint

❖ Use cases

❖ Futur works

๏ Improvement of the meta-model

๏ Support more robotic middleware in our software API

Conclusion

17

Page 31: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

FPGA vs Processor

4 Annexe

18

0

8,5

17

25,5

34

FPGA (VGA) processor(QQVGA)

Processing time per frame (ms)

0

0,4

0,8

1,2

1,6

FPGA (VGA) processor(QQVGA)

power (W)

• Device: APF 51

• Scenario 1:

• FPGA used to acquire image

• The object detection algorithm is implemented in C (on Processor)

• Image size QQVGA

• Scenario 2:

• The object detection algorithm is implemented on FPGA

• FPGA communicates object position with processor

• Image size VGA

Page 32: Speeding Up Robot Control Software Through Seamless ...lab-sticc.univ-brest.fr/~goulven/sharc2016/... · Speeding Up Robot Control Software Through Seamless Integration With FPGA

Validity of the VHDL Parser

4 Annexe

19

• The VHDL Parser is tested again some set of VHDL Benchmark

• The ANTLR set: standard VHDL Package

• IWLS 2005 Benchmarks

• ITC’99 Benchmarks

• Gaisler Research Benchmarks