DeepTrack: Learning Discriminative Feature Representations ... · Training patches Normalized...

DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networksfor Visual Tracking

Hanxi Li12

hanxi.li@nicta.com.au

Yi Li1

http://users.cecs.anu.edu.au/~yili/

Fatih Porikli1

http://www.porikli.com/

1 NICTA and ANU,Canberra, Australia

2 Jiangxi Normal University,Jiangxi, China

Defining hand-crafted feature representations needs expert knowledge, re-quires time-consuming manual adjustments, and besides, it is arguablyone of the limiting factors of object tracking.

In this paper, we propose a novel solution to automatically relearn themost useful feature representations during the tracking process in order toaccurately adapt appearance changes, pose and scale variations while pre-venting from drift and tracking failures. We employ a candidate pool ofmultiple Convolutional Neural Networks (CNNs) as a data-driven modelof different instances of the target object. Individually, each CNN main-tains a specific set of kernels that favourably discriminate object patchesfrom their surrounding background using all available low-level cues (Fig.1). These kernels are updated in an online manner at each frame after be-ing trained with just one instance at the initialization of the correspondingCNN.

Training patches

Normalized patches

Learned filters in Layer-1

Training faces

Pre-learn

Class-Specific Cue Layer-1

9, 20 × 20 Input Patch

32 × 32 Layer-2

9, 10 × 10 Layer-3

18, 2 × 2

Feature Vector 18 × 1

Output Label

Label 2 × 1

Convolution 13 × 13

Subsampling 2 × 2

Convolution 9 × 9

Subsampling 2 × 2

Fully Connected

Input-Image

32 × 32 Input-Image

32 × 32 Feature Maps

32 × 32

Image 32× 32

32× 32

e Maps 20 ×20

e Maps 20 ×20 32× 32 32× 32

32× 32

Class-Specific

Current frame

Input-Image

32 × 32

Image 32× 32

32× 32

e Maps 20 ×20

e Maps 20 ×20 32× 32 32× 32

32× 32

Input-Image

32 × 32

Image 32× 32

32× 32

e Maps 20 ×20

e Maps 20 ×20 32× 32 32× 32

32× 32

Figure 1: Overall architecture with (red box) and without (rest) the class-specific version.

Instead of learning one complicated and powerful CNN model for allthe appearance observations in the past, we chose a relatively small num-ber of filters in the CNN within a framework equipped with a temporaladaptation mechanism (Fig. 2). Given a frame, the most promising CNNsin the pool are selected to evaluate the hypothesises for the target object.The hypothesis with the highest score is assigned as the current detectionwindow and the selected models are retrained using a warm-start back-propagation which optimizes a structural loss function.

#34 #162 #208

CNN-1 CNN-2 CNN-3 CNN-20 CNN-4 … CNN-Pool

Prototypes …

… …

Test Est. Train

① Nearest CNNs via prototype matching

② Perform Nearest CNNs

④ Train from CNN-3

⑥ Update CNN-3

⑥ Add a new CNN

⑤ New CNN?

Time axis

Test Est. Train Test Est. Train

③ Select & rank

Figure 2: Illustration of the temporal adaptation mechanism.

Our experiments on a large selection of videos from the recent bench-marks demonstrate that our method outperforms the existing state-of-the-art algorithms and rarely loses the track of the target object. We evaluate

our method on 16 benchmark video sequences that cover most challeng-ing tracking scenarios such as scale changes, illumination changes, occlu-sions, cluttered backgrounds and fast motion (Fig. 3).

0 10 20 30 40 500

Location error thresholdP

Precision plots

Struck

0 0.2 0.4 0.6 0.8 10

Overlap threshold

Success plots

Struck

Figure 3: The Precision Plot (left) and the Success Plot (right) of thecomparing visual tracking methods.

In certain applications, the target object is from a known class of ob-jects such as human faces. Our method can use this prior information toleverage the performance of tracking by training a class-specific detec-tor. In the tracking stage, given the particular instance information, oneneeds to combine the class-level detector and the instance-level tracker ina certain way, which usually leads to higher model complexity (Fig. 4).

0 10 20 30 40 500

Location error threshold

Precision plots

Class−CNN

Normal−SGD

0 0.2 0.4 0.6 0.8 10

Overlap threshold

Success plots

Class−CNN

Normal−SGD

Figure 4: The Precision Plot (left) and the Success Plot (right) of thecomparing visual tracking methods.

To conclude, we introduced DeepTrack, a CNN based online objecttracker. We employed a CNN architecture and a structural loss functionthat handles multiple input cues and class-specific tracking. We also pro-posed an iterative procedure, which speeds up the training process signif-icantly. Together with the CNN pool, our experiments demonstrate thatDeepTrack performs very well on 16 sequences.

DeepTrack: Learning Discriminative Feature Representations ... · Training patches Normalized...

Documents

Transcript of DeepTrack: Learning Discriminative Feature Representations ... · Training patches Normalized...

33F Skin patches

Patches Magazine

Mission Patches

Sweep Patches

PVC Military Patches

Custom embroidered patches | Hand embroidery designs| Patches for clothes

MIIM™ ADVANCED PHYSICAL LAYER MANAGEMENT · Efficiëntere verander & werkorder management voor patches, switches en aangesloten apparatuur Minder downtijd – snellere en kostenbesparende

Applying Patches

Patches - javadc.org

NASA Mission Patches. History of Mission Patches.

Buprenorphine Patches

Complaint to TGA: Detox Foot Patches / Pads · (Happy Feet Detox Foot Patches, Byron Bay Detox Foot Patches, Kinoki Detox Foot Pads Clover Beauty Detox Foot Patches). In the U.S.,

DeepTrack: Learning Discriminative Feature Representations ...1 DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking Hanxi Li, Yi Li, Fatih

Writing acceptable patches: an empirical study of open source project patches

Alabama Patches

Patches Giasso

Irish Vegetation Classification (IVC) Community Synopsis€¦ · Community Synopsis ... Bilberry can form extensive patches. The bryophyte layer is well developed and contains Thuidium

Hydrophobic Patches

deepTrack(tm). présentation générale

Winter 2011 Patches