3D FACE RECONSTRUCTION BASED ON …serialsjournals.com/serialjournalmanager/pdf/1345199952.pdf3D...

6
IJDW Volume 4 • Number 1 • January-June 2012 • pp. 45-50 3D FACE RECONSTRUCTION BASED ON EPIPOLAR GEOMETRY Taher Khadhraoui 1 , Faouzi Benzarti 2 and Hamid Amiri 3 1,2,3 Signal, Image Processing and Patterns Recognition (TSRIF) Laboratory, National School Engineering of Tunis (ENIT), Tunisia 1 E-mail: [email protected] Abstract: The aim of 3D reconstruction is to retrieve the 3D geometric representation of an object either from a single image, several images or a sequence images. It has been used in a number of application domains, with specific methods for buildings and towns or body parts. 3D face reconstruction is a technology used for reconstructing three-dimensional face geometry from media such as images and videos. This paper presents a global scheme for 3D face reconstruction into a limited number of analytical patches from stereo images. From a depth map, we generate a 3D model of the face. Keywords: component; Stereo Vision; 3D Face Reconstruction; Face Alignment 1. INTRODUCTION The recognition and the identification of faces play a fundamental role in our social interactions. Various applications in Computer Vision, Computer Graphics and Computational Geometry require a surface reconstruction from a 3D point cloud extracted by stereovision from a sequence of overlapping images [2]. The process of construction 3D facial models is an important topic in computer vision which has recently received attention within the research community. 3D face reconstruction from 2D images is very important for face recognition and facial analysis. The epipolar geometry between two views is essentially the geometry of the intersection of the image planes with the pencil of planes having the baseline as axis (the baseline is the line joining the camera centres). This geometry is usually motivated by considering the search for corresponding points in stereo matching, and we will start from that objective here. The notion of Epipolar Geometry is a strong tool for us to use in Computer Vision. The essential idea is that, for a stereo pair of cameras, the projection of 3D points on a camera screen will lie on a plane defined by the two camera centers and the 3D points. Furthermore, any such plane will always lie on a particular pair of points on both image screens, called the Epipoles. This idea means that, given knowledge of the relative position and orientation of a pair cameras and a given pixel in one image, we can constrain our search of the corresponding pixel in the other image to a single line, rather than the entire image. Stereovision is one of the effective methods to estimate depth and structural parameters of 3D objects from a pair of 2D images [1]. The recent advances in multi-view stereovision provide an exciting alternative for outdoor scenes reconstruction [2]. In this paper we are interested to recover the 3D face model based on Epipolar Geometry. This paper is organized as follows: In section 2, we review some previous related works. Homoge- neous coordinates is given in section 3. Section 4 describes the stereoscopy method for 3D face reconstruction. The experimental results are given in section 5. 2. RELATED WORK In this section we provide a general overview of 3D face reconstruction methods reported in the literature. Kumar et al. [4] [12] proposed a reconstruction methodology for quadratic curves using non-digitized image planes extended problem with digitized and normalized image planes by considering various noisy cases and the analysis of the error in the reconstruction process. Zhang et al. [5] proposed a model based algorithm to fill the missing range information of a planar region in the depth map of an image obtained from a commercial stereo vision system. The morphable 3D face model

Transcript of 3D FACE RECONSTRUCTION BASED ON …serialsjournals.com/serialjournalmanager/pdf/1345199952.pdf3D...

IJDW Volume 4 • Number 1 • January-June 2012 • pp. 45-50

3D FACE RECONSTRUCTION BASED ON EPIPOLAR GEOMETRYTaher Khadhraoui1, Faouzi Benzarti2 and Hamid Amiri3

1,2,3Signal, Image Processing and Patterns Recognition (TSRIF) Laboratory, National School Engineering of Tunis (ENIT), Tunisia1E-mail: [email protected]

Abstract: The aim of 3D reconstruction is to retrieve the 3D geometric representation of an object either froma single image, several images or a sequence images. It has been used in a number of application domains,with specific methods for buildings and towns or body parts. 3D face reconstruction is a technology used forreconstructing three-dimensional face geometry from media such as images and videos. This paper presentsa global scheme for 3D face reconstruction into a limited number of analytical patches from stereo images.From a depth map, we generate a 3D model of the face.Keywords: component; Stereo Vision; 3D Face Reconstruction; Face Alignment

1. INTRODUCTION

The recognition and the identification of faces playa fundamental role in our social interactions.

Various applications in Computer Vision,Computer Graphics and Computational Geometryrequire a surface reconstruction from a 3D pointcloud extracted by stereovision from a sequence ofoverlapping images [2]. The process of construction3D facial models is an important topic in computervision which has recently received attention withinthe research community. 3D face reconstruction from2D images is very important for face recognition andfacial analysis.

The epipolar geometry between two views isessentially the geometry of the intersection of theimage planes with the pencil of planes having thebaseline as axis (the baseline is the line joining thecamera centres). This geometry is usually motivatedby considering the search for corresponding pointsin stereo matching, and we will start from thatobjective here. The notion of Epipolar Geometry is astrong tool for us to use in Computer Vision. Theessential idea is that, for a stereo pair of cameras,the projection of 3D points on a camera screen willlie on a plane defined by the two camera centersand the 3D points. Furthermore, any such plane willalways lie on a particular pair of points on bothimage screens, called the Epipoles.

This idea means that, given knowledge of therelative position and orientation of a pair cameras

and a given pixel in one image, we can constrainour search of the corresponding pixel in the otherimage to a single line, rather than the entire image.

Stereovision is one of the effective methods toestimate depth and structural parameters of 3Dobjects from a pair of 2D images [1]. The recentadvances in multi-view stereovision provide anexciting alternative for outdoor scenes reconstruction[2]. In this paper we are interested to recover the 3Dface model based on Epipolar Geometry.

This paper is organized as follows: In section 2,we review some previous related works. Homoge-neous coordinates is given in section 3. Section 4describes the stereoscopy method for 3D facereconstruction. The experimental results are givenin section 5.

2. RELATED WORK

In this section we provide a general overview of 3Dface reconstruction methods reported in theliterature. Kumar et al. [4] [12] proposed areconstruction methodology for quadratic curvesusing non-digitized image planes extended problemwith digitized and normalized image planes byconsidering various noisy cases and the analysis ofthe error in the reconstruction process. Zhang et al.[5] proposed a model based algorithm to fill themissing range information of a planar region in thedepth map of an image obtained from a commercialstereo vision system. The morphable 3D face model

46 Taher Khadhraoui, Faouzi Benzarti and Hamid Amiri

proposed by Blanz and Vetter et al. [6] presented a3D reconstruction algorithm to recover the shape andtexture parameters based on a face image in arbitraryview. However, its speed cannot satisfy therequirements of practical face recognition systems.

Pighin et al. [8] used a generic face model andmultiple images to recover the 3D face model. It canestimate the depth information by multiple images.However, with the generic face model, it needs tospecify many points to get accurate 3D model andcannot correct the mis-registration errors. Recently,Jiang et al. [7] presented an automatic 2D-to-3Dintegrated face reconstruction method to recover the3D face model based on one frontal face image andit is much faster. However, Jiang’s work can notaccurately recover the depth information due to lackof the depth information.

3. HOMOGENEOUS COORDINATES

In a normal Euclidean coordinate system, the lengthof the coordinate vector equals the number ofdimensions of the space in which a point lies [9].Therefore, a 3×1 matrix should be sufficient todescribe a point in 3D world. However, homoge-neous coordinates are used for 3D reconstructionpurposes, where 1 is added to the end of a normalcoordinate vector. The reason behind having thisadditional dimension is to obtain linear operations,which were actually non-linear in 3- dimensionalspace, e.g. translation.

3.1 Intrinsic and Extrinsic Calibration

Stereo calibration involves determining theparameters of a system using corresponding 2Dpoints. Intrinsic parameters of a camera include focallengths measured in width and height of the pixels(fx, fy), the skew (s) and the principal point (cx, cy) [10].Therefore, the coordinates of the equation aboveshould be scaled with these parameters as follows:

1 1 1

x x R

y y R

x f s c xy f f y (1)

Upper triangular matrix in this equation is calledcalibration matrix, and notation K is used for it.

In addition, the pictures are never taken fromthe same camera position and angle. Rotation matrixR and the translation matrix t are the extrinsic cameraparameters. We can combine the equation of the

camera having intrinsic parameters with a specificposition and orientation:

1 0 0 00 0 1 0 0 ,

0 11 0 0 1 0 0 1 0

1

x T Tt

y y Tt

Xx fx s c

YR Ry f f

Z�

T Ttm K R R M�

~m PM (2)

The 3×4 matrix P is called camera projection matrix.

3.2 Uncalibrated Camera Problem

Finding the depth information of a point becomesmore difficult when cameras are uncalibrated,meaning that the intrinsic and extrinsic parametersof the cameras are unknown. Therefore, it isnecessary to find more matched points between twoimages for the uncalibrated case. When two camerasview a 3D scene from different positions, there are anumber of geometric relations between the 3D pointsand their projections onto the 2D images. Epipolargeometry refers to the geometry of this stereovision[11]. For Epipolar geometry, the terms of epipole andepipolar line are important.

3.3 Epipolar Geometry and theFundamental Matrix

The epipolar geometry is the intrinsic projectivegeometry between two views. It is independent ofscene structure, and only depends on the cameras’internal parameters and relative pose.

3.3.1 Fundamental Matrix

The fundamental matrix F is the mapping of imagepoint x to its epipolar line l’

• l’¦ = F x is the epipolar line correspondingto x

• l = FT x’ the epipolar line corresponding to x’It can be shown that

[ ] xF e P P (3)

where

1 3 2

2 3 1

3 2 3

0

0

0x

e e e

e e e

e e e(4)

is the matrix representation of the cross product.

3D Face Reconstruction Based on Epipolar Geometry 47

Also, for any corresponding point pairs x, x’

0Tx Fx (5)

F is a rank 2 homogeneous matrix with 7 degreesof freedom.

F is invariant under projective transformation Hon the world space. i.e., even if X HX, by letting P

PH–1, F remains unchanged. There is a projectiveambiguity in P.

3.3.2 Essential Matrix

The fundamental matrix with the calibrationmatrices K, K’ removed. i.e., image points arenormalized by

1

1

ˆˆx K x

x K x (6)

Letting

ˆ ˆ 0

T

T

E K FKx E x (7)

For normalized camera P = [I | 0] and P’ = [R |t], the following holds:

[ ]xE t R (8)

E has 5 degrees of freedom: 3 rotation angles inR, 3 elements in t, but arbitrary scale.

Given SVD of E = U diag(1, 1, 0) VT, andassuming first camera is P = [I | 0], the second camerais one of the following:

3[ | ] [ | 3]

[ | 3] [ | 3]

T T

T T T T

P UWV u or P UWV u

or P UW V u or P UW V u (9)

where W = [0 –1 0; 1 0 0; 0 0 1]

Only one of these is physically possible (positivedepth from both cameras).

3.3.3 Computing the Fundamental Matrix

Fundamental matrix can be estimated up to scale

1 1 3

4 5 6 1 2 3 4 5 6 7 8 9

7 8 9

[ , , , , , , , , ]T

f f fF f f f f f f f f f f f f f

f f f(10)

Each corresponding point pairs (x, y, 1) and (x’,y’, 1) gives an equation

( , , , , , , , , 1) 0x x x y x y x y y y x y f (11)

Stacking n equations from n point correspon-dences gives linear system A f = 0, where A is an n×9matrix. If rank A = 8 then the solution is unique (upto scale) but in reality we seek a least-squaressolution with n 8. The LS solution is the last columnof V in SVD of A = UDVT (last column correspondsto the smallest singular value).

Figure 1 shows the epipolar geometry:

4. STEREOSCOPY METHOD

The algorithm for stereo matching employs epipolargeometry based face reconstruction to estimate thedisparity map on a stereo pair. Stereo Epipolarrequires two off-the-shelf digital cameras which areconnected together and calibrated so that they focusthe same object. The framework of stereo based facereconstruction is as follows:

Figure 1: The Epipolar Geometry Representation

48 Taher Khadhraoui, Faouzi Benzarti and Hamid Amiri

The process of 3D face reconstruction consistsof the following stages:

• Perform uncalibrated stereo imagerectification on a pair of stereo images,

• Match individual pixels along epipolar linesto compute the disparity map,

• The disparity map with the original stereoimages to create a 3D reconstruction of thescene.

We will see in the following main steps of themethod stereo.

4.1 Stereo Image Pair

In this step, the color stereo image pair is acquiredand converted to the gray scale for the matchingprocess. Using color images may provide someimprovement in accuracy.

4.2 Rectify Stereo Images

The rectification is the process of transforming stereoimages, such that the corresponding points have the

same row coordinates in the two images. It is a usefulprocedure in stereo vision, as the 2-D stereocorrespondence problem is reduced to a 1-D problemwhen rectified image pairs are used.

4.3 Points Corresponding

In stereo correspondence matching, since two imagesof the same scene are taken from slightly differentviewpoints using two cameras, placed in the samelateral plane, so, for most pixels in the left imagethere is a corresponding pixel in the right image inthe same horizontal line. The difference in thecoordinates of the corresponding pixels is known asdisparity. The basics of stereo correspondencematching are as follows:

For each epipolar line

For each pixel in the left image

- compare with every pixel on same epipolar linein right image

- pick pixel with minimum match cost

4.4 Disparity Map

The disparity can be defined by the followingequation [3]:

d = bf / z (12)

Where z is the distance of the object point fromthe camera (the depth), b is the base distance betweenthe left and right cameras, and f is the focal length ofthe camera lens.

The disparity map and the knowledge of therelative distance between the two cameras are usedto compute the depth map.

There are some distinct advantages of usingEpipolar geometry for 3D face reconstruction. Itprovides sufficient geometric information of the faces.In addition, 3D information has the potential toimprove the performance of face recognition systems.

5. EXPERIMENTAL RESULTS

For the implementation of our application, wesuccessfully implemented the various features as agraphical user interface.

The algorithm receives a pair of stereo images(left and right image) as an input, and outputs adisparity map (or the depth map).

The algorithm comprises the following steps.

Figure 2: Stereo 3D Reconstruction Process

3D Face Reconstruction Based on Epipolar Geometry 49

• Read image pair IL and IR.• Set the parameters which will be used.• Initialize OL and OR to be zeros.• Select K points from the right image.• For each point in the right image, a

corresponding point in the left image isobtained via correlation using the epipolargeometry.

• Compute parameters of a model M frommatched 2D points via triangulation.

• Estimate disparity according to Equation(12).

• Generate depth map.The 3D reconstruction algorithm must solve two

basic problems: correspondence, which deals withfinding an object in the left image that correspondsto an object in the right image, and reconstruction,which deals with finding the depth (i.e. the distancefrom the cameras which capture the stereo images)and structure of the corresponding point of interest.

Experimental results are given to demonstratethe viability of the proposed 3D face reconstructionmethod.

Figure 3 shows a pair of stereo images andFigure 4 shows the corresponding point usingepipolar geometry. Figure 5 shows the corres-ponding depth map.

6. CONCLUSION

We have proposed an efficient 3D face reconstructionmethod using epipolar geometry with multipleimages. The approach is definitely robust, simple,easy and fast to implement compared to otheralgorithms. It provides a practical solution to thereconstruction problem.

Future work includes applying the 3D model toface animation and recognition, and using robustmulti-view face alignment to automate thereconstruction.

REFERENCES

[1] Gaurav Gupta, Balasubramanian Raman, and RamaBhargava, “Reconstruction of 3D Plane using Min-Max Approach”, International Journal of Recent Trendsin Engineering, 2(1), November 2009.

[2] Nader Salman and Mariette Yvinec, “SurfaceReconstruction from Multi-View Stereo of Large-ScaleOutdoor Scenes”, The International Journal of VirtualReality, 2010, 5(3), 1-6.

[3] M. Mozammel Hoque Chowdhury and Md. Al-AminBhuiyan, “A New Approach For Disparity MapDetermination”, Daffodil International UniversityJournal of Science and Technology, 4(1), January 2009.

[4] S. Kumar, N. Sukavanam, and R. Balasubramanian,“Reconstruction of Quadratic Curves in 3D UsingTwo or More Perspective Views: Simulation Studies”.Proc. SPIE 6066, 60660M (2006); doi:10.1117/12.637736.

[5] Zhang J. Chen, Jagath Samaranbandu, “Planar RegionDepth Filling Using Edge Detection with EmbeddedConfidence Technique and Hough Transform”. InProceedings of the International Conference on Multimediaand Expo, 2, pp. 89-92, 2003.Figure 4: Stereo Correspondence

Figure 3: Stereo Pair

Figure 5: Disparity (Depth) Map

50 Taher Khadhraoui, Faouzi Benzarti and Hamid Amiri

[6] S. Romdhani, V. Blanz, and T. Vetter. “FaceIdentification by Fitting a 3d Morphable Model UsingLinear Shape and Texture Error Functions”. InProceedings of the European Conference on ComputerVision, 4, pp. 3-19, 2002.

[7] D. Jiang, Y. Hu, S. Yan, L. Zhang, H. Zhang, and W.Gao. “Efficient 3d Reconstruction for FaceRecognition”. Pattern Recognition, 38, 787-798, 2005.

[8] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D.Salesin. “Synthesizing Realistic Facial Expressionsfrom Photographs”. In SIGGRAPH’98 ConferenceProceedings, pp. 75-84. ACM, July 1998.

[9] Tom Davis, “Homogeneous Coordinates andComputer Graphics”, November 20, 2001.

[10] A. Akaydin, “3D Face Reconstruction from 2D Imagesfor Effective Face Recognition”, 2008.

[11] Makoto Kimura, Hideo Saito, “3D Reconstructionbased on Epipolar Geometry”, MVA2OOO IAPRWorkshop on Machine Vision Applications, Nov. 28-30,2000, The University of Tokyo, Japan.

[12] N. Sukavanam, R. Balasubramanian and S. Kumar,“Error Estimation of Quadratic Curves in 3D Space”,International Journal of Computer Mathematics, 84(1), pp.121-132, 2007.