Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe...

40
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1 , Mao Ye 1 , Xue Li 2 , Qihe Liu 1 , Yi Yang 2 and Jian Ding 3 1. University of Electronic Science and Technology of China 2. The University of Queensland 3. Tencent Group Sorry for no show because of the visa delay

description

Existing Problems: 1. Many previous works needed clean background images (without foregrounds) to train classifier. 2. To extract clean background, some works added assumption to background images (such as Linear Correlated).

Transcript of Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe...

Page 1: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Dynamic Background Learning through Deep Auto-encoder Networks

Pei Xu1, Mao Ye1, Xue Li2, Qihe Liu1, Yi Yang2 and Jian Ding3 1. University of Electronic Science and Technology of China

2. The University of Queensland3. Tencent Group

Sorry for no show because of the visa delay

Page 2: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Previous Works about Dynamic Background Learning:

Mixture of Gaussian [Wren et al. 2002]

Hidden Markov Model[Rittscher et al 2000]

1-SVM [Cheng et al. 2009]

DCOLOR[Zhou et al. 2013]

Page 3: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Existing Problems:

1. Many previous works needed clean background images (without foregrounds) to train classifier.

2. To extract clean background, some works added assumption to background images (such as Linear Correlated).

Page 4: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Preliminaries about Auto-encoder Network

In our work, we use the deep auto-encoder networks proposed by Bengio et al (2007), as the building block.

1. Encoding In the encoding stage, the input data is encoded by a function which is defined as:

[0,1]x N1

1 1(x) ( [0,1] )h h Mf 1 1 1h x xf sigm W b

where is a weight matrix, is a hidden bias vector, and sigm(z)=1/(1+exp(-z)) is the sigmoid function.

11

M NW R 11

Mb R

Page 5: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Then, h1 as the input is also encoded by another function which is written as:

where is a weight matrix, is a bias vector.

22 2(x) ( [0,1] )h h Mf

2 2 1 2h x hf sigm W b 2 1

2M MW R 2

2Mb R

Encoding

x

1h

2h

Page 6: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Encoding

x

1h

2h

1h

Decoding

2. DecodingIn the decoding stage, h2 is the input of

function: 1 2 2 2 3Th h hg sigm W b

where is a bias vector.13

Mb R

Page 7: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Then the reconstructed output is computed by the decoding function:

ˆ [0,1]x N

1 1 1 4ˆ Tx=g(h )= ( h )sigm W b

where is a bias vector. 4Nb R

The parameters ( and ) are learned through minimizing the Cross-entropy function written as:

jbiW

1

ˆ ˆ(x) log (1 ) log(1 )x x x xN

i i i ii

Page 8: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Proposed Method1. Dynamic Background Modeling

a. A deep auto-encoder network is used to extract background images from video frames.b. A separation function is defined to formulate the background images.

c. Another deep auto-encoder network is used to learn the ‘clean’ dynamic background.

Page 9: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Inspired by denoising auto-encoder (DAE), we view dynamic background and foreground as ‘clean’ data and ‘noise’ data, respectively.

[Vincent et al, 2008]

DAE needs ‘clean’ data to add noises and learns the distribution of the noises.

Page 10: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Unfortunately, in real world application, clean background images cannot be obtained, such as traffic monitoring system.

But do we really need ‘clean’ data to train an auto-encoder network?

Page 11: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Firstly, we use a deep auto-encoder network (named Background Extraction Network, BEN) to extract background image from the input video frames.

0

00

, , 1 1

ˆmin ( ; , , ) ( ) xx xE

jN Nj j i i

E iB i ii

BL B

Cross-entropy

Background Items

where vector B0 represents the extracted background image. And is the tolerance value vector of B0.

Page 12: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Background Items:

0

0

, , 1 1

ˆmin xE

jN Ni i

iB i ii

B

This item forces the reconstructed frames approach to a background image B0.

This regularization item controls the solution range of .

In video sequences, each pixel belongs to background at

most of time.

Basic observation of our work

Page 13: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Background Items:

0

0

, , 1 1

ˆmin xE

jN Ni i

iB i ii

B

To be resilient to large variance tolerances, in this item, we divide the approximate error by the parameter at the ith pixel.

How we train the parameters of the Background Extraction Network?

Page 14: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

0

00

, , 1 1

ˆmin ( ; , , ) ( ) xx xE

jN Nj j i i

E iB i ii

BL B

The cost function of Background Extraction Network:

The parameters contains , and , where

.

E 0B

{ ( 1,2), ( 1,..., 4)}E Ei EjW i b j

(1)

Page 15: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

(1) The updating of is:E

E E E

is the learning rate, and is written as: E0

10

ˆ( ; , , ) ( )

xx x

jN i iij j

iEE

E E E

BL B

Page 16: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

There is an absolute item in the second item. We adopt a sign function to roughly compute the derivative as follows:

0

1

ˆ ˆ( ) x xx j jj Ni i i

EiE i E

BE sign

where sign(a)=1(if a>0), sign(a)=0 (if a=0), and sign(a)=-1 (if a<0).

Page 17: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

(2) The updating of is the optimal problem written as:0B

0

0

1 1

ˆmin xi

N Dji i

Bi j

B

According to previous works about –norm optimization, the optimal is the median of for i=1,…, N.0

iB1 2ˆ ˆ ˆ, ,..., }{x x xDi i i

Page 18: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

(3) The updating of is the optimal problem written as:i

0ˆ( ) x j

i ii i

i

BL

Optimizing equals to minimize its logarithmic form, written as be zero. It follows that

( )iL ln ( )iL

Page 19: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

2 0

ln ( ) 2 1 0x̂

i ij

i ii i i

LB

The optimal is:i0

* x̂ ji i

iB

Page 20: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

After the training of Background Extraction Network (BEN) is finished, for the video frames xj (j=1, 2,…, D), we can get a clean and static background B0 , and the tolerance measure of background variations.

x̂ jiThe reconstructed output is not exact the background image though the deep auto-encoder network BEN can move some foregrounds in some sense.

Page 21: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

So we adopt a separation function to clean the output furthermore, which is:

0

0

0 0

ˆ ˆx xˆ(x , )

j ji i i ij j

i i i ji i i i

BB S B

B B

where are the cleaned background images.( 1,..., )jB j D

Page 22: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

If , then at the ith pixel of the jth background image equals . Otherwise, equals . For the input D video frames, we obtain the clean background image set ( ) in some sense.

0x̂ ji i iB

jiB x̂ ji

jiB

0iB

1{ ,..., }B DB B [0,1]j NB

Page 23: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

2. Dynamic Background Learning

Another deep auto-encoder network (named Background Learning Network, BLN) is used to further learn the dynamic background model.

Page 24: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

1{ ,..., }B DB BThe clean background images as the input data is used to train the parameters of the BLN. The cost function of Background Network is:

1

ˆ ˆ( , ) log (1 ) log(1 )N

j j j j jL i i i i

i

L B B B B B

Page 25: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Online Learning

In previous section, just D frames are used to train the dynamic background model. The number of samples is limited which may produce the overfitting problem.

To incorporate more data, we propose an online learning method.

Our aim is to find the weight vectors whose effecting of cost function is low.

Page 26: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

...W

Firstly, the weights matrix W is rewritten as , where is a N-dimensional vector and M is the number of the higher layer nodes.

1 2[ , ,..., ]MW W W WjW

Page 27: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Then, let denote a disturbance from .L ( 1, 2,..., )jW j M

We have . And then, ( ) ( )L W L L W W

( ) ( )L L W W L W

Page 28: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

So we get

3( ) 12

TTHL WL W W W O W

W

where is the Hessian

matrix of . Here we ignore the third-order term.

2

2

( ) ( ) ( )T

H= L W L W L WW W W

L

Using Taylor’s theory, we obtain

Page 29: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

For two hidden layer auto-encoder network, the optimal problem is to solve:

21 2 1

1 1, 2 2,

1min ( , )2

,

TH

s.t.

oEi

o o o oE E Ei i EiiW

o o o oE j E j E k E k

L W W tr W W

W e W W e W

where is the weights of two hidden layers. And is the jth column of identity matrix. Is the kth column of identity matrix.

( 1, 2)oEiW i

je 1N M ke1 2M M

Page 30: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

We sort the results of for and ,

respectively. The vector with some j, which satisfies

is substituted by a randomly chosen vector

satisfying , where is an artificial parameter.

( 1,2)ijL i 11,...,j M 21,...,k M

,oEi jW

ijL ,oEi rW

ijL

Page 31: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Experimental Results

We use six publicly available video data sets in our experiments, including Jug, Railway, Lights, Trees, Water-Surface, and Fountain to evaluate the performance.

Page 32: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Jug

Lights

Fountain

Railway

WaterSurface

Trees

TPR

TPR vs on six data sets.

The different values of provide different tolerances of dynamic background.

1. Parameter Setting

Page 33: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

We compute the TPR on six data sets with different . In our discussion below, we choose the value of which is corresponding to the highest TPR on each data set.

Specifically, =0.5, 0.4, 0.4, 0.5, 0.6, 0.4 correspond to the Jug, Lights, Fountain, Railway, Water-Surface, and Trees, respectively.

Page 34: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

2. Comparisons to Previous Works

Comparisons of ROC Curves

Page 35: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Table1: Comparisons of F-measure on Fountain, Water-Surface, Trees and Lights

Table2: Comparisons of F-measure on Jug and Railway

Page 36: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Comparisons of foreground extraction

Page 37: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Comparisons of foreground extraction

Page 38: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Online Learning Strategy Comparison

Comparisons of online learning strategy

Page 39: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Page 40: Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Thank you!

Feel free to contact us: [email protected]