Object Detection in Videos with Tubelet Proposal...

1
Object Detection in Videos with Tubelet Proposal Networks Proposals in Video Object Detection Tubelet Proposal Networks Qualitative Results Framework Encoder-decoder LSTM YouTube-Objects Dataset Experimental Settings Kai Kang 1 , Hongsheng Li 1 , Tong Xiao 1 , Wanli Ouyang 1,4 , Junjie Yan 2 , Xihui Liu 3 , Xiaogang Wang 1 1 The Chinese University of Hong Kong, 2 SenseTime Group Limited 3 Tsinghua University, 4 The University of Sydney Results on ImageNet VID Dataset Frames Per-frame static proposals Regression to GT Boxes (RegressBox) Regression to GT Movement (RegressMove, Ours) t y x Tubelet Proposal Network Motion Prediction Encoder LSTM Decoder LSTM Class Label Tubelet Features Tubelet Proposal Network Encoder-decoder LSTM Classification CNN Spatial Anchors t y x Tubelet CNN Tubelet Generation A B A A A A B B B B 4 2f 16 5f W 2 b 2 W 5 b 5 Block Initialization Iter 1 Iter 2 Iter 3 Iter 4 Iter 5 l Parallel Generation Class Label Decoder LSTM Encoder LSTM Tubelet Features Object localization on YouTube-Objects (YTO) dataset Qualitative results on ImageNet VID validation set Results on ImageNet VID validation set Initialization of tubelet proposal networks

Transcript of Object Detection in Videos with Tubelet Proposal...

Page 1: Object Detection in Videos with Tubelet Proposal Networksopenaccess.thecvf.com/content_cvpr_2017/poster/225... · 2018. 1. 23. · Object Detection in Videos with Tubelet Proposal

Object Detection in Videos with Tubelet Proposal Networks

Proposals in Video Object Detection Tubelet Proposal Networks Qualitative Results

Framework

Encoder-decoder LSTM

YouTube-Objects DatasetExperimental Settings

Kai Kang1, Hongsheng Li1, Tong Xiao1, Wanli Ouyang1,4, Junjie Yan2, Xihui Liu3, Xiaogang Wang11The Chinese University of Hong Kong, 2SenseTime Group Limited

3Tsinghua University, 4The University of Sydney

Results on ImageNet VID Dataset

Frames

Per-frame staticproposals

Regression to GT Boxes(RegressBox)

Regression toGT Movement

(RegressMove, Ours)

t

y

x

Tubelet Proposal Network

Motion Prediction EncoderLSTM

DecoderLSTM

ClassLabel

TubeletFeatures

Tubelet Proposal Network Encoder-decoder LSTM

ClassificationCNN

Spatial Anchors t

y

x

TubeletCNN

Tubelet GenerationA

B

A A A A

B

B

B

B

4

2f

16

5f

W2

b2

W5

b5

Block Initialization

Iter 1Iter 2

Iter 3Iter 4

Iter 5

l

Parallel Generation

Class Label

Decoder LSTM

Encoder LSTM

Tubelet Features

Object localization on YouTube-Objects (YTO) dataset

Qualitative results on ImageNet VID validation set

Results on ImageNet VID validation set

Initialization of tubelet proposal networks