3D Image Reconstruction and Human Body Tracking using Stereo Vision and Kinect Technology

22

description

3D Image Reconstruction and Human Body Tracking using Stereo Vision and Kinect Technology

Transcript of 3D Image Reconstruction and Human Body Tracking using Stereo Vision and Kinect Technology

Introduction Two different types of 3D image reconstruction

methods

Generating depth perception information using high

quality stereo image textures. It is computationally

heavy and inefficient.

Kinect Technology : Overall image quality is not

refined and it is low resolution.

Explore a method that produces higher quality

images and faster computation of depth

information.

Kinect?? Motion sensing input peripheral device for the

Microsoft Xbox 360 video game console.

It consists of 3 basic components- an Infrared (IR) Projector, an RGB Camera and an IR Monochrome Camera.

The infrared signals are emitted from the IR projector and the 3D depth map is generated by the reflected infrared signals which are retrieved on the IR monochrome camera.

Requires less computational time, hence suitable for dynamic environment.

Microsoft Kinect..

3D Geometrical Models

It is used for representing real world objects in 3D

image models from 2D images captured by cameras.

It requires calibration of camera (as the intrinsic and

extrinsic parameters are different).

Two types of camera calibration- Single Camera

Calibration Geometry and Two Camera Image

Plane (Epipolar Geometry).

Single Camera Calibration Geometry It defines three different coordinate systems- a real-

world coordinate system, a camera coordinate

system and an image coordinate system.

The point on the image plane is moving from one

position to another position with respect to the

images taken by the camera for calibrating the

camera.

Intrinsic parameters are remains same. To estimate

the extrinsic, the rotation matrix R and the

translation vector t is calculated.

Calibration equation as given can be represented as:

Where,

Matrix M1 represents the intrinsic parameters and

matrix M2 is the extrinsic parameters.

Pw is the real-world 3D coordinates.

Epipolar Geometry

The property of epipolar lines is defined as:

“If a feature projects to a point Pl in one camera

view, the corresponding image point Pr in the other

camera view must lie somewhere on an epipolar line

in the camera image. An image point in camera 1

corresponds to an epipolar line in camera 2 and vice

versa”.

3D Reconstruction and

Human Tracking

OpenCV Stereo Vision Calibration 3D geometrical model requires camera calibration

and depth perception mapping.

OpenCV stereo vision uses two cameras to construct stereoscopic image.

The intrinsic and extrinsic parameters of the two cameras are analyzed.

Stereo calibration is implemented by transforming those estimated parameters from each camera to their joint coordinate systems.

The corners of the chessboard image calibrate the real-world coordinate system with the 3D geometrical model.

3D Image Reconstruction

It requires the computation of depth image perception image .

Firstly, the intrinsic and extrinsic parameters are estimated using openCV stereo vision combined with the openCV calibration function.

Then, calculate the 3D depth map and derived 3D reconstructed image.

OpenCV stereo vision 3D depth map calculation requires heavy computation which is inefficient for real-time 3D video streaming.

Noise is present on the depth map which led to a corrupted image for 3D reconstruction due to high resolution mapping and the depth map estimation errors.

3D image reconstruction using Kinect

Requires less computational time compared to the

openCV stereo vision.

IR sensors requires less time to obtain depth

information and also gives accurate depth

perception.

By integrating the 3D depth map (obtained from

Kinect sensors) with the image from the RGB

camera on Kinect with the help of OpenGL (Open

Graphics Library), 3D image is reconstructed.

The main disadvantage of this method is the low

quality reconstructed 3D image, due to the low

quality of the Kinect RGB camera.

3D Image Reconstruction using both Kinect and HD camera

Fast computation of depth perception

information from Kinect and high quality texture

from the HD webcam are combined to improve the

reconstructed 3D image.

Using Kinect, a 3D depth map was retrieved and

integrated with high quality image from the camera in

OpenGL, which is suitable for real time dynamic

systems.

This design flow based on Kinect and one HD

camera offers real-time high resolution 3D image

with 30 FPS (Frame per Second).

The reconstructed 3D image has high resolution,

which requires less computational time.

Conclusion OpenCV stereo vision and Kinect are introduced for

3D image reconstruction. Taking advantage of the Kinect depth map with infrared sensors, faster generation of depth map is realizable.

Also, with the high definition webcam, texture of reconstructed 3D images can be improved.

Thus, combing the Kinect with the HD webcam can deliver high quality 3D image reconstruction for real-time video streaming. This high quality 3D image can be used for hand tracking and finger gesture detection and recognition.

Thank you