Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe...

Dynamic Background Learning through Deep Auto-encoder Networks

Pei Xu1, Mao Ye1, Xue Li2, Qihe Liu1, Yi Yang2 and Jian Ding3 1. University of Electronic Science and Technology of China

2. The University of Queensland3. Tencent Group

Sorry for no show because of the visa delay

Previous Works about Dynamic Background Learning:

Mixture of Gaussian [Wren et al. 2002]

Hidden Markov Model[Rittscher et al 2000]

1-SVM [Cheng et al. 2009]

DCOLOR[Zhou et al. 2013]

Existing Problems:

1. Many previous works needed clean background images (without foregrounds) to train classifier.

2. To extract clean background, some works added assumption to background images (such as Linear Correlated).

Preliminaries about Auto-encoder Network

In our work, we use the deep auto-encoder networks proposed by Bengio et al (2007), as the building block.

1. Encoding In the encoding stage, the input data is encoded by a function which is defined as:

[0,1]x N1

1 1(x) ( [0,1] )h h Mf 1 1 1h x xf sigm W b

where is a weight matrix, is a hidden bias vector, and sigm(z)=1/(1+exp(-z)) is the sigmoid function.

11

M NW R 11

Mb R

Then, h1 as the input is also encoded by another function which is written as:

where is a weight matrix, is a bias vector.

22 2(x) ( [0,1] )h h Mf

2 2 1 2h x hf sigm W b 2 1

2M MW R 2

2Mb R

Encoding

x

1h

2h

Encoding

x

1h

2h

1h

x̂

Decoding

2. DecodingIn the decoding stage, h2 is the input of

function: 1 2 2 2 3Th h hg sigm W b

where is a bias vector.13

Mb R

Then the reconstructed output is computed by the decoding function:

ˆ [0,1]x N

1 1 1 4ˆ Tx=g(h )= ( h )sigm W b

where is a bias vector. 4Nb R

The parameters ( and ) are learned through minimizing the Cross-entropy function written as:

jbiW

1

ˆ ˆ(x) log (1 ) log(1 )x x x xN

i i i ii

Proposed Method1. Dynamic Background Modeling

a. A deep auto-encoder network is used to extract background images from video frames.b. A separation function is defined to formulate the background images.

c. Another deep auto-encoder network is used to learn the ‘clean’ dynamic background.

Inspired by denoising auto-encoder (DAE), we view dynamic background and foreground as ‘clean’ data and ‘noise’ data, respectively.

[Vincent et al, 2008]

DAE needs ‘clean’ data to add noises and learns the distribution of the noises.

Unfortunately, in real world application, clean background images cannot be obtained, such as traffic monitoring system.

But do we really need ‘clean’ data to train an auto-encoder network?

Firstly, we use a deep auto-encoder network (named Background Extraction Network, BEN) to extract background image from the input video frames.

0

00

, , 1 1

ˆmin ( ; , , ) ( ) xx xE

jN Nj j i i

E iB i ii

BL B

Cross-entropy

Background Items

where vector B0 represents the extracted background image. And is the tolerance value vector of B0.

Background Items:

0

0

, , 1 1

ˆmin xE

jN Ni i

iB i ii

B

This item forces the reconstructed frames approach to a background image B0.

This regularization item controls the solution range of .

In video sequences, each pixel belongs to background at

most of time.

Basic observation of our work

Background Items:

0

0

, , 1 1

ˆmin xE

jN Ni i

iB i ii

B

To be resilient to large variance tolerances, in this item, we divide the approximate error by the parameter at the ith pixel.

How we train the parameters of the Background Extraction Network?

0

00

, , 1 1

ˆmin ( ; , , ) ( ) xx xE

jN Nj j i i

E iB i ii

BL B

The cost function of Background Extraction Network:

The parameters contains , and , where

.

E 0B

{ ( 1,2), ( 1,..., 4)}E Ei EjW i b j

(1)

(1) The updating of is:E

E E E

is the learning rate, and is written as: E0

10

ˆ( ; , , ) ( )

xx x

jN i iij j

iEE

E E E

BL B

There is an absolute item in the second item. We adopt a sign function to roughly compute the derivative as follows:

0

1

ˆ ˆ( ) x xx j jj Ni i i

EiE i E

BE sign

where sign(a)=1(if a>0), sign(a)=0 (if a=0), and sign(a)=-1 (if a<0).

(2) The updating of is the optimal problem written as:0B

0

0

1 1

ˆmin xi

N Dji i

Bi j

B

According to previous works about –norm optimization, the optimal is the median of for i=1,…, N.0

iB1 2ˆ ˆ ˆ, ,..., }{x x xDi i i

(3) The updating of is the optimal problem written as:i

0ˆ( ) x j

i ii i

i

BL

Optimizing equals to minimize its logarithmic form, written as be zero. It follows that

( )iL ln ( )iL

2 0

ln ( ) 2 1 0x̂

i ij

i ii i i

LB

The optimal is:i0

* x̂ ji i

iB

After the training of Background Extraction Network (BEN) is finished, for the video frames xj (j=1, 2,…, D), we can get a clean and static background B0 , and the tolerance measure of background variations.

x̂ jiThe reconstructed output is not exact the background image though the deep auto-encoder network BEN can move some foregrounds in some sense.

So we adopt a separation function to clean the output furthermore, which is:

0

0

0 0

ˆ ˆx xˆ(x , )

x̂

j ji i i ij j

i i i ji i i i

BB S B

B B

where are the cleaned background images.( 1,..., )jB j D

If , then at the ith pixel of the jth background image equals . Otherwise, equals . For the input D video frames, we obtain the clean background image set ( ) in some sense.

0x̂ ji i iB

jiB x̂ ji

jiB

0iB

1{ ,..., }B DB B [0,1]j NB

2. Dynamic Background Learning

Another deep auto-encoder network (named Background Learning Network, BLN) is used to further learn the dynamic background model.

1{ ,..., }B DB BThe clean background images as the input data is used to train the parameters of the BLN. The cost function of Background Network is:

1

ˆ ˆ( , ) log (1 ) log(1 )N

j j j j jL i i i i

i

L B B B B B

Online Learning

In previous section, just D frames are used to train the dynamic background model. The number of samples is limited which may produce the overfitting problem.

To incorporate more data, we propose an online learning method.

Our aim is to find the weight vectors whose effecting of cost function is low.

...W

Firstly, the weights matrix W is rewritten as , where is a N-dimensional vector and M is the number of the higher layer nodes.

1 2[ , ,..., ]MW W W WjW

Then, let denote a disturbance from .L ( 1, 2,..., )jW j M

We have . And then, ( ) ( )L W L L W W

( ) ( )L L W W L W

So we get

3( ) 12

TTHL WL W W W O W

W

where is the Hessian

matrix of . Here we ignore the third-order term.

2

2

( ) ( ) ( )T

H= L W L W L WW W W

L

Using Taylor’s theory, we obtain

For two hidden layer auto-encoder network, the optimal problem is to solve:

21 2 1

1 1, 2 2,

1min ( , )2

,

TH

s.t.

oEi

o o o oE E Ei i EiiW

o o o oE j E j E k E k

L W W tr W W

W e W W e W

where is the weights of two hidden layers. And is the jth column of identity matrix. Is the kth column of identity matrix.

( 1, 2)oEiW i

je 1N M ke1 2M M

We sort the results of for and ,

respectively. The vector with some j, which satisfies

is substituted by a randomly chosen vector

satisfying , where is an artificial parameter.

( 1,2)ijL i 11,...,j M 21,...,k M

,oEi jW

ijL ,oEi rW

ijL

Experimental Results

We use six publicly available video data sets in our experiments, including Jug, Railway, Lights, Trees, Water-Surface, and Fountain to evaluate the performance.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Jug

Lights

Fountain

Railway

WaterSurface

Trees

TPR

TPR vs on six data sets.

The different values of provide different tolerances of dynamic background.

1. Parameter Setting

We compute the TPR on six data sets with different . In our discussion below, we choose the value of which is corresponding to the highest TPR on each data set.

Specifically, =0.5, 0.4, 0.4, 0.5, 0.6, 0.4 correspond to the Jug, Lights, Fountain, Railway, Water-Surface, and Trees, respectively.

2. Comparisons to Previous Works

Comparisons of ROC Curves

Table1: Comparisons of F-measure on Fountain, Water-Surface, Trees and Lights

Table2: Comparisons of F-measure on Jug and Railway

Comparisons of foreground extraction

Online Learning Strategy Comparison

Comparisons of online learning strategy

Thank you!

Feel free to contact us: [email protected]

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe...

Documents

Transcript of Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe...