Embedded AI Sensing Technology for Self-Driving Applications · 2019-05-23 · Embedded Deep...

NCTU iVSLAB CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTENCTU iVSLAB CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE

Embedded AI Sensing Technology for Self-Driving Applications

Prof. Jiun-In Guo

Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan

April 25th, 2019

1

NCTU iVSLAB CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE

Outline

• Brief introduction to NCTU iVS Lab

• Key factors of embedded AI technology

• Industrial collaboration and conclusion

2


Outline




3

NCTU iVSLAB CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 4

Our Strength:

Embedded AI for

ADAS/Self-driving

NCTU

World of Self-Driving Vehicles


Introduction to NCTU iVSLab

5

Years

on ADAS

development

Vision-based

ADAS

functions

Academia-Industry

cooperation

experiences

8 25 90


ADAS Functions We developed

6

HDR Night Vision

• Single-camera

• Dual-camera

• Camera array

• Local HDR

Wide-view Video Stitching

(360 degree surround view)

Fish-eye Correction

Inclement Weather Processing• DLCE

• Dehazing

Speed Limit Detection(Circle, Rectangle)

(Triangular sign)

(IMX6 integration)

Alley View

Car License Plate Detection

Traffic Light Detection

Lane Departure Warning

(Straight/Curve lanes, ISO-17361))

Forward Collision Warning

Blind Spot Detection• Optical flow

• Machine learning

• Deep learning

Pedestrian & Scooter Detection• Machine learning

• Deep learning

Stop & Go

2D/3D Hand Tracking

Driving Dangerous Behavior Detection

Object detection• Embedded deep learning

ADAS Integration• Introduction

• Mobile APP

https://youtu.be/RFqHjG9ttFk

https://youtu.be/RFqHjG9ttFk

https://youtu.be/ScFlxFIwAgI

https://youtu.be/ScFlxFIwAgI

https://www.youtube.com/watch?v=6mnk5F-Z9Vg&feature=youtu.be

https://www.youtube.com/watch?v=6mnk5F-Z9Vg&feature=youtu.be

https://www.youtube.com/watch?v=08m1AJQI12Q&feature=youtu.be

https://www.youtube.com/watch?v=08m1AJQI12Q&feature=youtu.be

https://youtu.be/sapVLbT2X6c

https://youtu.be/sapVLbT2X6c

https://www.youtube.com/watch?v=moQk9i9ehDg&feature=youtu.be

https://www.youtube.com/watch?v=moQk9i9ehDg&feature=youtu.be

https://www.youtube.com/watch?v=fArqUV2ODwg&feature=youtu.be

https://www.youtube.com/watch?v=fArqUV2ODwg&feature=youtu.be

https://youtu.be/lcJpV4sfCaw

https://youtu.be/lcJpV4sfCaw

https://youtu.be/tvCQv8-Xgkw

https://youtu.be/tvCQv8-Xgkw

https://youtu.be/7YbavyjrQIk

https://youtu.be/7YbavyjrQIk

https://youtu.be/e_W506x0SMU

https://youtu.be/e_W506x0SMU

https://www.youtube.com/watch?v=T9cjvD1SyAU&feature=youtu.be

https://www.youtube.com/watch?v=T9cjvD1SyAU&feature=youtu.be

https://youtu.be/t6Kp2T_KjXU

https://youtu.be/t6Kp2T_KjXU

https://youtu.be/3u8Io0ZjX4A

https://youtu.be/3u8Io0ZjX4A

https://youtu.be/DjNHsbRK4ls

https://youtu.be/DjNHsbRK4ls

https://youtu.be/bW_UQMRFRHY

https://youtu.be/bW_UQMRFRHY

https://www.youtube.com/watch?v=gznGCj7eW8I&feature=youtu.be

https://www.youtube.com/watch?v=gznGCj7eW8I&feature=youtu.be

https://youtu.be/Q02SOKr0HeE

https://youtu.be/Q02SOKr0HeE

https://youtu.be/8YMKlqjxdKU

https://youtu.be/8YMKlqjxdKU

https://youtu.be/0qydzWHSldY

https://youtu.be/0qydzWHSldY

https://www.youtube.com/watch?v=0YV-732EFUE

https://www.youtube.com/watch?v=0YV-732EFUE

https://youtu.be/iR1Wk9y3Lrk

https://youtu.be/iR1Wk9y3Lrk

https://www.youtube.com/watch?v=YbKDv9e1GSQ

https://www.youtube.com/watch?v=YbKDv9e1GSQ

https://youtu.be/T7l5NFSA9N0

https://youtu.be/T7l5NFSA9N0

研華月會_20161007/Viscovery 計畫討論/LDWS_FCWS_Stop&Go in APP.MOV

研華月會_20161007/Viscovery 計畫討論/LDWS_FCWS_Stop&Go in APP.MOV


NCTU AI Project Proposal (2018-2021)

7


Project Breakthrough (1/2)

•深度學習工具開發與資料建置• 自動化深度學習標記工具─ezLabel，標記效率提升10倍，為第一個支援Video 自動標記工具，並獲AUDI Innovation Award Taiwan 兩大獎。

• 建立相關深度學習所需之影像資料庫，現已累積超過1450萬筆資料，並於107年10月26

日公開9萬6千筆ADAS/自駕車深度學習標記樣本資料，供學術界與產業界有興趣人士免費下載使用。

• 研發首款可支援Bit accurate dynamic fixed point quantization CNN model

training/inferencing 工具, ezQuant，可支援 CNN 硬體加速器專用之深度學習模型開發(Less than 2.2 % accuracy drop (NCTU SSD lite) and 3.6% accuracy drop (NCTU one stage

Pvanet) models)

• 研發Hybrid Fixed point/Binary CNN深度學習模型訓練工具(ezHybrid-M)，可訓練Hybrid

fixed point/binary CNN model (reducing 91% model size) at cost of less than 2% quality drop

(on NCTU SSD-Mobilenet)

8


Project Breakthrough (2/2)•嵌入式深度學習模型開發

• 最遠可偵測超過200公尺外車輛的嵌入式深度學習模型(TSBBR)，超越現有文獻標竿 YOLO v2 之偵測距離4倍，且其準確度高於YOLO v2模型10% mAP，可於NVIDIA DRIVE-PX2即時執行。

• 研發可偵測100m外的紅綠燈/交通標誌深度學習技術，其偵測準確度可達 86% mAP。• 開發嵌入式深度學習物件辨識技術，第1個成功將SSD lite 深度學習模型移植在TI TDA2X平台並進行架構優化，其前方物件距離可達100公尺(超越現行SSD模型之偵測距離兩倍)，準確度與現行SSD相當。

• 結合物件辨識與物件行為辨識之超車預警系統，可預測後方車輛(汽車或機車)未來3秒鐘是否超車，準確度超過95%，可結合電子照後鏡應用，為目前文獻上首見。另外，可分析前方行人是否有穿越馬路之行為，準確度也達90%，可整合於AEB自動緊急剎車系統中。

• 研發 camera/radar sensor fusion 技術，可提升14%之物件偵測準確度(81%95%)，大幅提高ADAS/自駕車應用之物件偵測之可靠度。

•應用於ADAS/特殊用途無人載具實現• 開發出台灣第一台智慧自駕輪椅，榮獲第18屆旺宏金矽獎評審團銅獎殊榮。

•榮譽• 獲得科技部2018年AIslander競賽佳作獎，在CES2019 Eureka Park台灣館參展。• 入選科技部2018未來科技展，並獲得未來科技突破獎。• 2018年衍生產學合作計畫已達十八件之多，合作金額達新台幣 1598萬元，學術論文發表8篇，競賽獲獎有5項，專利獲證1件，申請中6件，國際合作MOU簽訂兩案。

9


Outline




10


Challenges: Why AI Sensing Technology? Potential Solution for Fatal Crash Video

• Deep learning technology combining inclement weather processing

11

Original NCTU iVS Lab (Dehazing+Deep learning)

The AEB solution cannot detect trucks in this case.

Detecting truck 2 seconds before collision


Challenges: Why AI Sensing Technology? Potential Solution for Fatal Car Accidents

• Self-driving Uber car involved in fatal accident in Arizona - NBC News (March 20th, 2018)

• Vehicle speed = 60km/h (move 16.6m/sec)

• Breaking distance = 20m for a=-7m/sec*sec

• Breaking time to stop = 2.4 sec

• It only has at most 1 sec for reaction since the feet of pedestrian are seen or 0.6-0.7 sec reaction since the whole pedestrian is seen.

• Need to detect the pedestrian earlier based on sensor fusion technology

12Original NCTU DLCE+AI technology

Detecting pedestrian 0.6 seconds before collision

The AEB solution cannot detect pedestrian in this case.


Embedded Deep Learning Development

• Example: Video Object Detection

13

Data LabelingData

AugmentationModel Training

Model Quantization

and Porting on AI SoC


Embedded AI Sensing Technology

14

Dataset Software Hardware

Efficient Labeling

Abundant Dataset

High Performance

High Quality

Embedded System

Accelerator

System Integration



15


Efficient Labeling

Abundant Dataset

High Performance

High Quality

Embedded System

Accelerator

System Integration


Dataset Problem that AI Faced

16

“Each hour of data collected takes almost 800 human hours to

annotate.” said Sameep Tandon, CEO of self- driving startup Drive.ai in Mountain

View, Calif.

Stupid

Time-consumingAnnoying

ExhaustedTedious

BoringSlow High-cost

Bad-quality1:800


Solution

17

Automatic labeling tool

https://www.aicreda.com

Powered by NCTU iVS Lab and creDa !


Vision of ezLabel

18

Collect

Data

Label

Data

Train

Model

Select

Data

Test

Model


Service of ezLabel

資料分析(Data analytics)

自動標記工具(ezLabel)

資料管理(Data

management)

19


Features of ezLabel

20

Segmentation

Labeling

Behavior

Labeling

Automatic

Labeling

Delivery: May 20th


ezLabel AutoLabeling

21

Step 1Step 2 Step 3

2018 AUDI Innovation Award Taiwan• WeMo Scooter Prize

• AUDI HQ prize (Prove of Concept)


The Fast Labeling Tool – ezLabel 2.0

22

33m15s 03m33s

33.25 min 3.55 min

9.35xin single object labeling

ezLabel 2.0

15x Speed up

in multiple objects labeling

Matlab

Labeling a vehicle in 1100 frames


ADAS Datasets We Have Built

23

4,420,646

1,111,886

1,880,173

4,189,090

898,369 1,077,979

337,447

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

4,500,000

5,000,000

Vehicle Pedestrian Rider Lane marks Signage TrafficLight Behavior

data samples!

14+ million


How to Use ezLabel?

24



25


Efficient Labeling

Abundant Dataset

High Performance

High Quality

Embedded System

Accelerator

System Integration


Detecting Far ObjectsTask Specific Bounding Box Regressor (TSBBR)

26

Single bounding box regressor Proposed TSBBR Architecture

Conditional back propagation

mechanism

Output from convolution 6

Minimum object for detection

15x15 in pixels

Quality

mAP= 82.4%@iVS dataset

mAP= 86.5%@Pascal voc2007

(add iVS data in training)

Performance (Resolution:448x448)

67 fps (NVIDIA 1080 Ti)

20fps (NVIDIA DRIVE PX-2)

9fps (NVIDIA Jetson TX-2)


Detecting Far ObjectsTask Specific Bounding Box Regressor (TSBBR)

27

Input

Resolution: 448x448Base layers

(Darknet-19)Pool 5

Convolution

6

Pool 6

Regressor

for small

objects

Regressor

for large

objects(Patent Pending)

H

y

p

e

r

f

e

a

t

u

r

e

s

Pass through

Shield convolution(half kernels)


Detecting Far ObjectsRobust on Different Kinds of Weathers

28

YOLOv2NCTU iVSLab

vs


Far Distance Object DetectionDetecting vehicles as far as 200m (on Carsim RT)

29

YOLO v2 (50m)NCTU iVSLAB (200m)


Traffic Sign Recognition based on PVANET

30

• Widely use C.ReLU module in early stage• Simple feature comes with orthogonality• Reduce computational complexity

• Inception modules in middle-rear stages• Deeper network – better result• Combine multi-scale features

• HyperNet in the last to combine multi-scale features• Multi-scale features from different layers• Channel wise concatenation in the last


Traffic Sign Recognition (Based on Pvanet)

31

• mAP=86%

• Min. 16x16

pixels traffic

signs in

Taiwan

• As far as 100m

detection on

traffic light (at

night)


Multiple Vehicle Detection(Based on Pvanet)

32


Object Detection and Tracking

33

• Deep learning object detection with Pvalite

• NCTU IOU-based fast moving object tracking

• Performance: Input video: HD1080@30fps , CNN kernel:720x480@35fps (GTX1050)


Object Behavior Prediction Rear Vehicle Overtaking Prediction

• Combining 2D convolution and 3-D convolution• 2D convolution for determining region proposals

• 3D convolution for object behavior classification

• 2D CNN and 3D CNN

34

Applying 2D convolution on a video(multiple frames) Applying 3D convolution on a video

Temporal information loss


Object Behavior Prediction Rear Vehicle Overtaking Prediction

35

Predict not overtaking Predict overtaking

Input

Rear camera

Output

Overtaking or not from left or

right direction in the next three

seconds

Accuracy

95.7%

Performance

29 fps @C3D112x112 (NVIDIA Jetson TX-2)

Model size

4.5M@C3D112x112


Object Behavior PredictionPredicting rear vehicle overtaking using C3D with heatmap layer

36

data

3d conv1

Relu

3d Max Pool

datadatadataData

16

112

3d

cnn

3d cnn

3d cnn

3d cnn56

56

2828

1414

77

112

5x50 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 10 0

0 0 0 0 0

1.2 0.4 1.5 0.6 1.3

2.1 2.2 1.4 2.3 0.4

1.2 1.1 1.6 1.4 1.2

1.6 2.6 1.5 8.6 0.1

1.5 1.1 2.3 2.5 2.1

Training Process

Testing Process

112x1125x5 112x112R

esiz

e

HS

V


Object Behavior Prediction

37

Using C3D only Using 2D CNN and C3D together


Extension to Other Applications

• Pedestrian crossing

38

• Performance: 20 fps @416x416 2D CNN/C3D112x112

(NVIDIA Jetson TX-2)



39


Efficient Labeling

Abundant Dataset

High Performance

High Quality

Embedded System

Accelerator

System Integration


ADAS/Self-Driving Industry Collaboration

40


Computing Platforms for Self-Driving

41

Chip vendors

Chip vendors

S32V234, S32V334

TDA4X (2019/Q4)

DRIVE-PX2


Hardware Platform for AI Deep Learning

• Requiring a deep learning model facilitating real-time implementation• Light weight and accurate model

• No control intensive operations

42

nVidia Jetson TX-1/TX-2/Xavier or

nVidia DRIVE-PX2TI TDA2X

(40-50 GMAC/S)NXP S32V234(80 GMAC/S)

RCNN Accelerator


• Architecture• Removing

• ROI pooling layer

• Fully connected layer

• Modifying• RPN layer to be CLN layer

• To be realized in embedded SoCs• TI TDA2X

• Renesas R-car H3

• CNN accelerator

43

Proposed One Stage Faster RCNN Model for Embedded System (One Stage Pvanet)

Single Image

2D CNN

CLN layer

Class Score Bbox Pred

CLN: Classification and Localization Network

• Advantages over YOLO v2 (416x416)• Reducing 93% model size (3.6M)

• Reducing 73% complexity (3.4G MAC/frame)

• Same quality

• 76.5% mAP@Pascal VOC 2007 5-class objects


NCTU SSD Lite Object Detection on TI TDA2X (SSD Lite 512x512)

First AI model on TDA2X in Taiwan

• To support better quality and longer detection distance

SSD Jacinto-Net 512x512

Minimum detectable object pixel width: 50

NCTU SSD Lite 512x512

Minimum detectable object pixel width : 30


Person car motorbike bicycle bus Total (%)

mAP 73.16 77.40 75.58 75.12 72.89 74.83

45

NCTU SSD Lite Object Detection on TI TDA2XTraining accuracy on PASCAL VOC dataset

• Original dense model:

• Sparse model: (70.5%)

• After fine-tune with coco dataset:


mAP 68.80 74.20 70.85 71.71 67.12 70.55


mAP 70.72 77.55 72.36 73.34 72.06 72.71

TI SSD Jacinto-Net 512x512 mAP=70.8%

Model Sparsity# of non-zero

parameters(Million)MAC

(Giga/frame)FPS

SSD lite 512x512 with 32 0.0% 3.62 2.97 15.95

SSD lite 512x512 with 32 70.5 % 1.07 1.56 30.30

• Performance evaluation:


Comparing to Mobilenet-Yolo v3• NCTU SSD Lite outperforms MobileNet-YOLOV3 in both speed and accuracy

46

Caffe Framework MobileNet-SSD MobileNet-YOLOV3 NCTU SSD Lite

Fps (1080Ti)(300x) 10 20 100

MobileNet-SSD MobileNet-YOLOV3 YOLOv3-tiny YOLOv3

Fps (TitanX)(300x) 142 208 220 45

mAP (coco / voc) NA / 72.7 38.9 / 76.3 33.1 / NA 51.5 / NA

Model Size 22.2 MB 19.9 MB 33.8 MB 237 MB

Optimize FP16 11.6 MB(0.523) NA 17.7 MB(0.524) 123.8 (0.522)

Feature extractor MobileNetV2 MobileNetV2 Darknet53

NCTU SSD Lite MobileNet-YOLOV3


Segnet-Sematic SegmentationUsing BDD100K – Drivable Area

• Network: JacintoSeg Net

• Input Size: 512x512

• MAC: 4.42 (Giga/frame)

• Sparsity: 80.8%

• Top1 – accuracy: 95.72 %

31


Segnet-Sematic SegmentationUsing BDD100K – Lane Recognition • Network: JacintoSeg Net

• Input Size: 512x512

• Classification: Single (red), Double (yellow), Dash (blue)

• Dataset: Generated from BDD100K

31

BDD dataset validation NCTU dataset verification

BDD dataset generation


Combining Lane Area/Lane Mark SegmentationExperiment (Highway Day)

• Network: Jacinto-based segmentation network

• Input size: 512x1024

• Number of classes: 5• Main lane

• Alternative lane

• Double lane line

• Single lane line

• Dashed lane line

• Optimizer: Adam

• Epoch: 10


Combining Lane/Lane Mark SegmentationExperiment (Highway Night, City Day)

Highway, Night time City, Day time


Combining Object Detection/Lane/Lane Mark SegmentationExperiment (Highway and City Day)

Highway, Day Time City, Day time


Fixed Point AI Models

Discovering Low-Precision

Networks Close to Full-Precision

Networks for Efficient Embedded

Inference

ICLR2019 [2018.09.11, 2019.02.25]

Jeffrey L. McKinstry, Steven K. Esser,

Rathinakumar Appuswamy, Deepika Bablani,

John V. Arthur, Izzet B. Yildiz & Dharmendra

S. Modha

IBM Almaden Research Center

650 Harry Road, San Jose, CA 95120, USA


Solution for AI Model Quantization

53

ezQuantSupporting bit accurate dynamic quantization

on CNN model training and inferencing


• Training : voc2007 + voc2012 trainval

• Testing : voc2007

54

ezQuant: Fixed Point Model Optimization

YOLOv2 NCTU One stage Pvanet NCTU SSD Lite

Person AP 0.773 0.812 0.776

Car AP 0.828 0.851 0.861

Bus AP 0.821 0.858 0.818

Motorbike AP 0.842 0.817 0.819

Bicycle AP 0.796 0.846 0.828

mAP 0.812 0.837 (better quality) 0.820 (same quality)

resolution 416x416 512x512 512x512

Parameter(million)

48.22 3.61(reducing 93%) 3.961(reducing 91%)

MAC(G)/frame 12.68 3.407(reducing 73%) 2.934(reducing 77%)


ezQuant: Dynamic Quantization Result(NCTU One-stage PVA-Net)

Bit-width

Accuracy

1 0.012426

2 0.007848

3 0.007597

4 0.187862

5 0.586783

6 0.771206

7 0.790516

8 0.813026

9 0.8143

10 0.81696

11 0.81365

12 0.812221

Bit-width

Accuracy

1 0.00129

2 0.00129

3 0.011045

4 0.02398

5 0.045163

6 0.355154

7 0.769702

8 0.814237

9 0.828496

10 0.82371

11 0.820912

12 0.824181

Bit-width

Accuracy

8 0.660734

9 0.780183

10 0.803915

11 0.804695

12 0.803167

13 0.804035

14 0.809527

15 0.807493

16 0.807493

17 0.807493

18 0.807493

19 0.807493

Bit-width

Accuracy

8 0.021775

9 0.388489

10 0.600439

11 0.72778

12 0.788622

13 0.798882

14 0.800949

15 0.803452

16 0.791953

17 0.804113

18 0.803923

19 0.803923

Convolution Layer output and Input AdderMultiplier

Test using VOC 5 class dataset

(2795 images)

Weight

Input, Output

AdderMultiplier

mAP 0.8370.813

(2.4% drop)0.801

(3.6% drop) 55


ezQuant: Dynamic Quantization Result (NCTU SSD lite)

Bit-width

Accuracy

1 0.010217

2 0.001255

3 0.051122

4 0.573308

5 0.750512

6 0.797244

7 0.808564

8 0.812908

9 0.813709

10 0.814216

11 0.814559

12 0.814224

Bit-width

Accuracy

1 0

2 0

3 0.001525

4 0.006166

5 0.061615

6 0.618732

7 0.764296

8 0.81479

9 0.82228

10 0.820975

11 0.820359

12 0.819915

Bit-width

Accuracy

8 0.65469

9 0.786137

10 0.79859

11 0.798683

12 0.804646

13 0.803486

14 0.805815

15 0.801277

16 0.801277

17 0.801277

18 0.801277

19 0.801277

Bit-width

Accuracy

8 0.022362

9 0.189253

10 0.592589

11 0.736209

12 0.786907

13 0.788416

14 0.786137

15 0.786137

16 0.786137

17 0.786137

18 0.786137

19 0.786137

Convolution Layer output and Input AdderMultiplier

Weight

Input, Output

AdderMultiplier

mAP 0.8200.801

(1.9% drop)0.798

(2.2% drop)

Test using VOC 5-class dataset

(2795 images)56


Outline




57


Industrial Collaboration (2018~)

58


Conclusion: Taiwan’s Opportunity

• AI end-to-end learning to prevent from IP infringement

• Data Self collected data

• Model Optimized from open model

• Platform Adopt commercialized AI SOC

To meet industrial applications

• Embedded AI (Edge intelligence) brings lots of opportunities for

Taiwan’s industry

• Collect our own data

• Develop our own AI model

• Design our own AI chip

59


Thank you very much for your attention !

http://ivs.ee.nctu.edu.tw/iac/

Our Vision, Your Intelligence !

Q&A

http://ivs.ee.nctu.edu.tw/iac/

Embedded AI Sensing Technology for Self-Driving Applications · 2019-05-23 · Embedded Deep...

Documents

Transcript of Embedded AI Sensing Technology for Self-Driving Applications · 2019-05-23 · Embedded Deep...