Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Post on 22-Dec-2015

218 views 3 download

Transcript of Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Learning and Recognizing Activities in Streams of Video

Dinesh Govindaraju

Motivation

Activity recognition from video for higher functionalityWho is presenting

agenda itemAttendee interest

levels

Motivation

Want it to be automatic and not involve hand generation of modelsImpractical in the case of many

activitiesLess versatile as you might be

constrained to particular aspects of the problem

Problem Definition

Video Data Observations are extracted

movement deltas via face tracking Hand label training segments Learn underlying models from

training segments Carry out activity recognition

Approach - Learning

Assume underlying models can be approximated by HMMs

Use Baum Welch to learn best model using training segments

Need to find observation space and number of states

Approach - Learning

To find observation space:Run through all training segments

and add observationsFor new observation when doing

recognition, augment learned observation matrices

Approach - Learning

To find number of states, Q (for each activity):Set upper bound as length of longest

training segmentIterate over values and generate

most likely model using Baum Welch

Approach - Learning

To find number of states, Q (for each activity):Choose best Q using N-fold cross

validation using criterion of discriminative power

With best Q, run Baum Welch using a number of sets of randomly initialized parameters to get λa

Approach - Recognition

Define a window width, w From the beginning, sequentially

consider windows of observations (where L is length of entire sequence)

Approach - Recognition

Calculate likelihood of each window segment

L Rabinier, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings IEEE, 1989

Approach - Recognition

Label middle frame in each window with activity with highest likelihood

Evaluation and Results

Activities being observed:

Evaluation and Results

Observation stream obtained from 87 second long image sequence

1296 individual frames Example frames after face detection:

Evaluation and Results

Observation sequence first hand labeled

Segments showing same activity extracted

4 training segments used to learn each activity

Evaluation and Results

Evaluation and Results

Once underlying models were learned, calculate likelihood using sliding window

Value of 21 was used for the window width, w, as this was the average length of training segments

Evaluation and Results

Evaluation and Results

Carry out recognition using the likelihoods by assigning activities to the frames

Compare against hand assigned labels

Accuracy approximately 76%

Evaluation and Results

Algorithm assigned:

Different from hand label

Same as hand label

Evaluation and Results

Hand assigned:

Different from algorithm label

Same as algorithm label

Future Work

Learn underlying model generating sequence of activities themselves

Standardize lengths of training segments using Dynamic Time Warping and use that as the window width

The End

Questions