Single person pose recognition and tracking

Post on 05-Dec-2014

1.664 views 4 download

description

Slides for my Master Thesis defense at TUDelft.

Transcript of Single person pose recognition and tracking

Single Person Pose Recognition and Tracking

Defender: Javier Barbadillo AmorInformation and Communication Theory (ICT) GroupDelft University of Technology

Committee: Dr. Alan HanjalicDr. Emile. A. HendriksPhD. Feifei HuoDr. Pavel Paclik

25-06-2010

2

3

Outline:

Introduction Single Person Pose Recognition and Tracking System

Theory

The goal of this researchImprove Body Parts Detection and Pose recognition

Experiments and Results• Improved Hand Detection• Detection of a new class: Non-Pose Classification

Conclusions

Future Work

4

Introduction

5

Single Person Pose Recognition and Tracking System

• Real time • One single camera• Game control with detected poses

Theory: Background Subtraction by Mixture of Gaussians

• Compare the current frame with a model of the background.• Obtain a binary image with the foreground pixels

6

7

Theory: Background Subtraction by Mixture of Gaussians

• History of pixel intensity values:

• An intensity value belongs to a Gaussian Distribution if it is within [-2.5σ, 2.5σ]

• Each pixel is modeled by K Gaussians.

Theory: Particle Filter for tracking the torso and head

8

• Torso and head region detection

• Hand Detection

Theory: Particle filter for tracking torso and head

9

•N particles are generated

• Weights π assigned according to measured probability.

• Father particles spread into G sons

10

• Sample sets of particles are generated for 3 states: x, y and Scale

• The probability of the state of the torso is given by

Theory: Particle filter for tracking torso and head

Primitive for torso and head

11

• Hand Detection with general skin color model

Theory: General skin color detection

• Relative distances between hands and torso center.• Angles of the hands with the torso center.• r, l and t stand for right, left and torso.

12

Theory: Feature extraction

• Incoming observations = 6-feature-set• Classifier decides one Pose class.• Each Pose number is a different action in the game

13

Theory: Pose Classification

The goal of this research• Improve the system performance

– Hand detection: fails for short sleeves and “skin color clothes”– Pose recognition: detect Non-Poses

14

Hands detected in the forearm The 9 Predefined Poses

15

Experiments and Results

Skin color detection combined with human blob silhouette for hand detection

16

• Preliminary hand position is obtained from the center of gravity of the biggest skin blobs.

•First, general skin color detection is applied using this mask:

17

Skin color detection combined with human blob silhouette for hand detection

We check if the blob is:

- Below the heep

- Between the heep and the shoulder

- Over the shoulder

18

Skin color detection combined with human blob silhouette for hand detection

•Al the cases where people are wearing short sleeves or “skin color clothes” are correct now.

• Non-Pose classification

19

DEFINITION: Everything different from a predefined Pose

More features Needed!

• Clear Non-Pose: poses where one or both hand positions are in between positions corresponding to Poses

20

Non-Pose classification

21

Non-Pose classification• 17 videos from 17 different people were recorded.• Features extracted from each frame by processing the videos. • Multiple labeling with PRSD Studio, Matlab.

22

Non-Pose classification

Initial Dataset Improved Dataset

23

Non-Pose classification

• Experiments with Initial Dataset

First approach:

Second approach:

• “Leave one person out” method for realistic results.• Tested on Parzen, K-Nearest-Neighbor and Gaussian classifiers.

24

Non-Pose classification: Initial Dataset

ROC curve

• Mean error from LOPO: errors from all people summed and divided by the number of people.

25

•Best results detecting Poses are for K-Nearest-Neighbor, in general.•Parzen and Gaussian are considaribily worse.

Non-Pose classification: Initial Dataset

K-Nearest-Neighbor

26

• Parzen classifier shows more interesting results for a particular person (Hasan).

27

Results for Hasan´s samples as Test, from a single experiment.

•Pose 9 has 404 samples in total.

-120 from Hasan (Test)-171 from Saleem (Training)

Is there any relation?

Non-Pose classification: Initial Dataset

• Two correct ways of performing the same Pose result in quite different features.• Errors in Parzen give us an idea on how to improve even more K-NN performance.

28

Non-Pose classification: Initial Dataset

29

10-NN 5-NN

•All the samples from Carmen are missclassified as Non-Poses.

Non-Pose classification: Initial Dataset

Cascade of detector and classifier

• Results are much better with this approach than with the cascade. Single Pose classes seem to be better modelled than the whole Poses class with K-Nearest-Neighbor.

30

Second approach:

Non-Pose classification: Initial Dataset

31

• Experiments with Improved Dataset

– All classes have more than 100 samples

•For 10-NN the error on Poses decreased 1.5% and the error on Non-Poses decreased 3%.•Having more samples from singles Poses makes the whole Poses class more robust.

Mean errors for the detector trained on Poses.

Non-Pose classification: Improved Dataset

Mean errors for detector trained on Non-Poses.

32

•Training on Non-Poses doesn´t improve detection.

•Non-Poses are more difficult to model than Poses.

Non-Pose classification: Improved Dataset

33

•Now, Carmen´s samples of Pose 3 are correctly detected as Poses.•Pose 3 class is more compact.

Initial Dataset Improved Dataset

Non-Pose classification

34

Non-Pose classification

Decreased from 2% to 0%

Increased from 0% to 1%!!!

•More samples of poses 3 and 4 improved Detection on Poses and Non-Poses detection, but didn´t improve classification of the Pose classes.

Conclusions• The Improved hand detection is a simple method but robust, and solves the problem of

wrong detection for short sleeves.

• Non-Pose class is difficult to model because it overlaps with Poses and it is not a compact class. Anyway, almost 80% of Non-Poses can be detected.

• Having a good dataset might improve results drastically.– Samples must represent different people and ways of performing poses– Samples of wrong hand detections increase the error rate

• The K-Nearest-Neighbor is the best method for modelleing this Pose classes.

• The more restrictive the system is, the better results will be: Comprimise Solution

35

Future Work

• Make a new Dataset with the improved hand detection.

• Add a new feature for detecting more Non-Poses, e.g., face detection.

• Elbow detection.

36

37

I appreciate your attention

Questions?

Initial Dataset

38

Improved Dataset

39

Spatial Game

40