3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views...

3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views

2006.8.17

outline

Introduction Propose Framework Space Complexity Synthetic Versus Real Training Data Edge-Based View Matching Experimental Results Future Work Conclusion

Introduction

Estimate 3D hand pose from a single image by matching the image with a large database.

What are the storage requirement for an adequate database of training views?

What are the similarity measures? How can the matching be done efficiently?

Introduction

In the database contains more than 100,000 image, generated from 26 hand shape.

In the real images use skin color dectection.

Proposed Framework

Model the hand as an object, consisting 16 links : the palm and 15 links corresponding to finger parts.

Proposed Framework

The five joints connecting fingers between finger links allow rotation with two degrees of freedom (DOFs).

The 10 joints between finger links allow rotation with on DOF

A total of 20 DOFs describes completely all degrees of freedom in the joint angles.

Proposed Framework

Add the viewing parameter.

Given a hand configuration vector

and a viewing parameter vector

, define the hand pose vector

),,( 201 ccC

),,( 321 vvvV

),,,,,( 321201 vvvccP

Proposed Framework

The generic framework that we propose for hand pose estimation is the following:

1. create a database containing a uniform of all possible views of all possible configuration.

2. for each novel image, find the database views that are the most similar. Use the parameters of those views estimates for the image.

Space Complexity

Depend on the number of database images.

In this paper, have 86 viewpoints and generated 48 images for each viewpoints

Use PCA to reduce hand shape configuration

Synthetic Versus Real Training Data

A big advantage of synthetic training sets is that the labeling of the data can be done automatically.

Problem : hard to correct, need multicamera setup.

Edge-Based View Matching

Have defined image similarity using chamfer distance.

Given an input image, extract its edge pixels using an edge detector (canny) and store the coordinates in a set X.

Experimental Results

DB have 26 different hand shapes, each shape rendered from 86 viewing direction, each direction have 48 images.

Test have 28 real hand pose image.


Define the distance D between a point

and a set of points X to be the Euclidean distance between and the point in X that is the closest to :

xXDXx

min),(

YyXx

c XyDY

YxDX

YXD ),(1

),(1

),(

Future work

Database use real hand pose image

Add finger detector

Conclusions

Almost half of the test images the system retrieved correct views in the top ten matches.

3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views...

Documents

Transcript of 3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views...