20161006 rsp2016 ohkawa-presen

21
PEAR LAB Utsunomiya Univ. ARCHITECTURE EXPLORATION OF INTELLIGENT ROBOT SYSTEM USING ROS-COMPLIANT FPGA COMPONENT Takeshi Ohkawa , Kazushi Yamashina, Takuya Matsumoto, Kanemitsu Ootsu, Takashi Yokota Utsunomiya University, Japan 2016/10/6 RSP2016@ESWEEK, Pittsburgh 1

Transcript of 20161006 rsp2016 ohkawa-presen

PEAR LAB Utsunomiya Univ.

ARCHITECTURE EXPLORATION OF

INTELLIGENT ROBOT SYSTEM USING

ROS-COMPLIANT FPGA COMPONENT

Takeshi Ohkawa, Kazushi Yamashina, Takuya Matsumoto, Kanemitsu Ootsu, Takashi Yokota

Utsunomiya University, Japan

2016/10/6 RSP2016@ESWEEK, Pittsburgh 1

PEAR LAB Utsunomiya Univ.

•Background

•ROS-Compliant FPGA Component

•Proposal

• Architecture Exploration Method for Intelligent Robot System

by using ROS-Compliant FPGA Component

•Case Study: Visual SLAM

• Distributed Processing of Visual SLAM between robot and cloud

• Functional Partitioning

• Architecture Exploration at Model level

•Conclusion and future work

Outline

2016/10/6 RSP2016@ESWEEK, Pittsburgh 2

PEAR LAB Utsunomiya Univ.

•Requirements for Autonomous Mobile Robots

• Processing: High-performance, ex) Image recognition, SLAM

• Mobile: Low Power due to battery operation

•Expectation: introduction of FPGA into robots

• Power efficiency: high-performance processing at low power

• Problem: difficult development of FPGA

•Robot engineering = integration of components

• Necessity of reducing the cost for introducing FPGA

•Our Solution:

• ROS (Robot Operating System) Compliant FPGA Component

Background:

Requirements for Robot development

2016/10/6 RSP2016@ESWEEK, Pittsburgh 3

PEAR LAB Utsunomiya Univ.

•ROS is a component-based application framework for robotic software and build tools.• ROS is not an OS. Does not guarantee realtimeness.

• ROS runs on Linux (Ubuntu).

•Abundant component library – productivity!

•Communication model in ROS: Publish/Subscribe• Easy for develop, modify, test and maintenance (Cf. Client/Server)

ROS (Robot Operating System)

2016/10/6 RSP2016@ESWEEK, Pittsburgh 4

Node

Publication Subscription

SubscriberPublisher

Topic

Service invocation

msg

Massage (data)

Node Node

PEAR LAB Utsunomiya Univ.

ROS-Component structure

for introducing FPGA into ROS system

ROS-compliant FPGA component [2]

2016/10/6 RSP2016@ESWEEK, Pittsburgh 5

[2] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu and Takashi Yokota : “Proposal of ROS-compliant

FPGA Component for Low- Power Robotic Systems - case study on image processing application -”,

Proceedings of 2nd International Workshop on FPGAs for Software Programmers, FSP2015, pp. 62-67, 2015.

Interface

for input

FPGA

Interface

for output

Communication

ROS compliant component

Topic Topic

Subscribe Publish

PEAR LAB Utsunomiya Univ.

•Labeling

ROS compliant FPGA component:

Example of image processing

2016/10/6 RSP2016@ESWEEK, Pittsburgh 6

Labeling

Measured processing time of labeling

26x faster

than SW

0.032

0.835

0.075

0.0

0.2

0.4

0.6

0.8

1.0

FPGA+ARM…SW only (ARM) SW only (PC)

Tim

e (

sec)

[2] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu and Takashi Yokota : “Proposal of ROS-compliant

FPGA Component for Low- Power Robotic Systems - case study on image processing application -”,

Proceedings of 2nd International Workshop on FPGAs for Software Programmers, FSP2015, pp. 62-67, 2015.

PEAR LAB Utsunomiya Univ.

Total latency of the ROS compliant

FPGA component

2016/10/6 RSP2016@ESWEEK, Pittsburgh 7

Speed UP 1.7x

Resolution:1920x1080

Zedboard (Zynq-7020)

ARM(PS): Cortex-A9 666MHz

FPGA(PL): 100MHz

PC: Core i7 870 2.93GHz

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

FPGA+ARM…

SW only (ARM)…

SW only (PC)…

time (s)1 : Communication of ROS nodes (Publish/Subscribe)

2 : From after subscribe to before labeling

3 : Processing of labeling

4 : From after labeling to before publish

5 : Communication of ROS nodes

[2] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu and Takashi Yokota : “Proposal of ROS-compliant

FPGA Component for Low- Power Robotic Systems - case study on image processing application -”,

Proceedings of 2nd International Workshop on FPGAs for Software Programmers, FSP2015, pp. 62-67, 2015.

PEAR LAB Utsunomiya Univ.

cReComp*: automated component generator [3]

2016/10/6 RSP2016@ESWEEK, Pittsburgh 8

• Input

• User Logic: Verilog HDL

• Configuration file:• scrp

(specification for cReComp )

• Python**

(build AST)

•Output

• HDL: Control of FIFO

• C++: ROS node

•Target: Xilinx Zynq

HW I/FSW I/F

ROS node

*.v

ROS-Compliant FPGA component

*.cpp

Data comm.

Hardware

config

User

logic

generat

e

Software

cReComp

FIFO

FIFO

*.scrp or *.py

*.v

*cReComp: creator for Reconfigurable Component

**The python interface is developed using PyVerilog by S. Takamaeda https://github.com/PyHDI/Pyverilog

[3] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu, Takashi Yokota,

“cReComp: Automated Design Tool for ROS-Compliant FPGA Component”, IEEE 10th International Symposium

on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), Sep. 22, 2016

https://github.com/kazuyamashi/cReComp

PEAR LAB Utsunomiya Univ.

•FPGA and Software can be replaced easily at runtime

•Dataflow model can directly mapped into runtime!

• Rapid System Prototyping

HW/SW architecture exploration using

ROS-compliant FPGA component

2016/10/6 RSP2016@ESWEEK, Pittsburgh 9

(a) ROS system with pure software (SW) (b) ROS system with SW/FPGA hybrid

SWNode

A

/topic1

SWNode

B

/topic2

/topic3

SWNode

DFPGA

Node

C /topic4

SWNode

A

/topic1

SWNode

B

/topic2

/topic3

SWNode

DSW

Node

C /topic4

latency

PEAR LAB Utsunomiya Univ.

Flow of architecture exploration

2016/10/6 RSP2016@ESWEEK, Pittsburgh 10

Model-level architecture exploration

Runtime-level architecture exploration

Model design

Output

Component

Library

Reuse

Data-flow

model

SW: ROS component

HW: ROS-Compliant

FPGA component

PEAR LAB Utsunomiya Univ.

•Background

•ROS-Compliant FPGA Component

•Proposal

• Architecture Exploration Method for Intelligent Robot System

by using ROS-Compliant FPGA Component

•Case Study: Visual SLAM

• Distributed Processing of Visual SLAM between robot and cloud

• Functional Partitioning

• Architecture Exploration at Model level

•Conclusion and future work

Outline

2016/10/6 RSP2016@ESWEEK, Pittsburgh 11

PEAR LAB Utsunomiya Univ.

• Input: sensor values of distance (Laser Range Finder and so on)

• Output: Map around the robot and the self location in the map

• Trade-off: precision of the map, processing time• Robot cannot equip with High-performance processor due to power problem

SLAM (Simultaneous Localization and mapping)

2016/10/6 RSP2016@ESWEEK, Pittsburgh 12

Feedback each otherSelf

LocalizationMapping

input

External sensor・Camera

・LRF(depth)

Internal sensor・Gyro

output

Image data

Depth data

Control value

Localization

Mapping

[1]

Robot

PEAR LAB Utsunomiya Univ.

•Briefly: SLAM using only image sensor

•See below: Processing flow example of Visual SLAM

• SLAM after Image processing

•Problem: Large amount of processing

Visual SLAM

2016/10/6 RSP2016@ESWEEK, Pittsburgh 13

Feature

Description

Image

Input

Feature

Pursing

Feature

Matching

Self

LocalizationMapping

Feature

Extraction

Image processing SLAM

PEAR LAB Utsunomiya Univ.

•Partitioning SLAM processing at a certain point, transferring

data to cloud from robot

•Challenge: to reduce processing amount at robot side,

explore the way of off-load processing into cloud servers

Basic concept of distributed SLAM processing

2016/10/6 RSP2016@ESWEEK, Pittsburgh 14

Image sensor

Processing at Cloud-side

Processing at Robot-side

Input

Image

Processing flow

SLAM front-end

SLAM core

Data transfer to cloud-side server

Self Location

And MapFeature

Previous

Frames

Update of

location

And Map

Parallel Processing

On Servers

Data Volume

Processing Volume

SLAM backend

PEAR LAB Utsunomiya Univ.

•Target implementation of SLAM: RTAB-Map[16]

•RTAB-Map(Real-Time Appearance-Based Mapping)

• Input: RGB-D camera

• Winner in IROS 2014 Kinect Robot Navigation Contest *

Architecture exploration by partitioning

and distributed processing of SLAM

2016/10/6 RSP2016@ESWEEK, Pittsburgh 15

3D map example (our lab) by RTAB-Map RGB-D camera (Kinect)

Pose graph

Point cloud

[16] RTAB-Map < http://introlab.github.io/rtabmap/ (Access: 2015/12/14)

*Winning the IROS2014 Microsoft Connect Challenge - SV ROS (San Jose, CA),

http://www.meetup.com/ja/SV-ROS-users/pages/ (Access: 2015/10/28)

PEAR LAB Utsunomiya Univ.

Bandwidth of dataflow and task load

2016/10/6 RSP2016@ESWEEK, Pittsburgh 16

Node(Time)

Topic

Detaflow

OS Ubuntu14.04 LTS

CPU Intel Core i7-4712MQ (Max. 3.3GHz)

Memory 8GB

ROS version Indigo

RTAB-Map 0.10.11

RGB-D sensor Kinect for Windows (Microsoft corporation)RGB img 640×480, Depth img 320×240, 30fps

Experimental environment (PC)

Map

ConstructionMap

Visuali-

zation

Camera

30fps

rtabmapcamera

Depth img

RGB img

Odom.

Map data

Odom.

For visualize

Stat. info(67.45ms) (290.45ms)

Self

Localization

27.8MB/s

18.5MB/s

9.3KB/s

378KB/s

208KB/s

2.22KB/s

Processing time per a frame (avg. of 100 sec x 3)

Bandwidth

PEAR LAB Utsunomiya Univ.

•Each candidate needs to transfer image data• More than 370Mbps (27.8 + 18.5 = 46.3 MB/s)

•Does not satisfy the requirement of bandwidthfor mobile/wireless autonomous robots

3 candidates of partitioning

2016/10/6 RSP2016@ESWEEK, Pittsburgh 17

Map

ConstructionMap

Visuali-

zation

Camera

30fps

rtabmapcamera

Depth img

RGB img

Odom.

Map data

Odom.

For visualize

Stat. info(67.45ms) (290.45ms)

Self

Localization

27.8MB/s

18.5MB/s

9.3KB/s

378KB/s

208KB/s

2.22KB/s

#1 #2 #3 Cloud

side

Robot

side

PEAR LAB Utsunomiya Univ.

Result example:

Architecture Exploration at Model level

2016/10/6 RSP2016@ESWEEK, Pittsburgh 18

Self

Localization

Map

ConstructionMap

Visuali-zation

Camera

Input

30fps

rtabmapcamera

Depth

image

RGB

image

Odom.

Map Data

Odom.

For visualize

Info

Partitioning

ORB Feature

Extraction

Feature

Vector

Added nodeRobot side

(53.69ms) (276.69ms

(13.76ms)

480KB/s

27.8MB/s

18.5MB/s

9.3KB/s

378KB/s

208KB/s

2.22KB/s

Cloud side

The model can be

directly mapped into

ROS software and ROS-

compliant FPGA

component

PEAR LAB Utsunomiya Univ.

SW only Proposed(HW/SW)

Communication

Bandwidth

>370Mbps 6.38Mbps

Transferring

Data

Image

46.4MB/s

Feature Vector

480KB/s

Processing time

at robot side

#1: 0ms

#2: 67.45ms

#3: 357.9ms

13.76ms

Processing time

at cloud side

#1: 357.9ms

#2: 276.69ms

#3: 0ms

330.4ms

Summary : Result of Architecture

Exploration at Model level

2016/10/6 RSP2016@ESWEEK, Pittsburgh 19

(Time and

power can be

reduced more

by using FPGA)

PEAR LAB Utsunomiya Univ.

•We have proposed Architecture Exploration at model-level and runtime-level for Intelligent Robot System using ROS-compliant FPGA component.

•We learned from the case study of Visual SLAM that:• Off-loading of some part in Visual SLAM processing onto server outside the robots has potential to improve the processing performance or to reduce power consumption at robot side

•Future work• Power exploration at model-level, exploration at runtime-level and implementation of the distributed architecture of Visual SLAM

Conclusion

2016/10/6 RSP2016@ESWEEK, Pittsburgh 20

PEAR LAB Utsunomiya Univ.

THANK YOU!

2016/10/6 RSP2016@ESWEEK, Pittsburgh 21