The Florence 2D/3D Hybrid Face Dataset

2
The Florence 2D/3D Hybrid Face Dataset Andrew D. Bagdanov, Alberto Del Bimbo, Iacopo Masi Media Integration and Communication Center, University of Florence, Italy {bagdanov, delbimbo, masi}@dsi.unifi.it - http://www.micc.unifi.it Abstract: This poster describes a new dataset collected by the Media Integration and Communication Center at the University of Florence. The dataset consists of: I High-resolution 3D scans of human faces from many subjects. I Several video sequences of varying resolution, conditions and zoom level for each subject. Each subject is recorded in the following situations: I In a controlled setting in HD video. I In a less-constrained (but still indoor) setting using a standard, PTZ surveillance camera. I In an unconstrained, outdoor environment under challenging recording conditions. This dataset is being constructed specifically to support research on techniques that bridge the gap between 2D, appearance-based recognition techniques, and fully 3D approaches. It is designed to simulate, in a controlled fashion, realistic surveillance conditions and to probe the efficacy of exploiting 3D models in real scenarios. 1 - Face as a biometric Figure 1: Examples of a high resolution mesh. Human faces are one of the most important biometrics for recognition. Face imagery is easily and non- intrusively collectible, whereas other biometrics such as fingerprints or iris scans are impractical to implement in many scenarios (e.g. a surveillance setting). Because of the universality of faces as a biometric, there has been a proliferation of face recognition approaches proposed in the research literature [10]. Along with the proliferation of algorithms for face recognition there has also been an explosion of datasets designed to support research in face recognition. Face recognition remains a difficult problem. Much of this difficulty is due to challenging imaging conditions and variations caused by expressions, gender and pose. More recently, 3D scanning technology has matured and the price of entry is much less. This has led to renewed interest in face recognition using 3D models of human faces. One unexplored avenue of research on facial analysis is the potential of using 3D models to augment the performance of traditional 2D, appearance-based techniques. 2 - Design of the Dataset Our dataset is designed with two main goals in mind. First, we would like to make available accurate and complete 3D models of faces to researchers who are primarily interested in the analysis of 3D meshes and textures of human faces. That is, our dataset is designed to be useful for research on pure 3D analysis techniques. We have also designed our dataset to go beyond the scope of 3D analysis techniques, allowing researchers to investigate the possibility of reducing the gap between 2D a computer vision algorithms and those methods that work on more precise, 3D models. In particular, our dataset is thought of in the context of evaluating the use of 3D information in computer vision problem like 3D face pose estimation [3] and 3D face recognition [1] directly from video data or still images. To this end, the pipeline of data acquisition is designed to provide both 3D data and 2D videos consistent with each other. I First, a 3D model is captured of each subject using a 3D scanner b . I Second, we record high definition (HD) video of the subject as he simulates a range of specific head rotations (this roughly corresponds to a cooperative environment). Five levels of zoom are used in order to capture the subject face at multiple image resolutions. I After this, the subject is then recorded from two PTZ Cameras, one indoor and the other outdoor. These two scenarios represent a more non-cooperative subject and he is asked to be spontaneous. Three levels of zoom are captured in each video in order to cover a broad range of face resolutions. 1) Scanner 3D 2) Cooperative Video 3) PTZ Indoor (spontaneous, talking, gestures) 4) PTZ Outdoor (a) Each step of the acquisition. 2) 1 Lateral Model (90°) 3) 1 Lateral Model (-90°) 1) 2 Frontal Models (one for testing and one for training) 4) 1 with glasses (if present) (b) 3D models acquired after the first step. Figure 2: Acquisition process and 3D Models. a When we use the term “2D” we refer to video or image sequences of database subjects. b To capture each 3D model we used a state-of-the-art “3dMDface System” scanner. 3 - The Data In terms of 3D data, our dataset contains high-resolution 3D models (both meshes and textures). Some example models are shown in figure 3. (a) Frontal 1 (b) Frontal 2 (c) Left Side (d) Right Side (e) Glasses Figure 3: Examples of each type of 3D model for five random subjects. The three types of videos recorded for each subject are defined and shown respectively in figure 4, 5 and 6. (a) Zoom level 1 (b) Zoom level 2 (c) Zoom level 3 (d) Zoom level 4 Figure 4: Samples of the HD video taken of subjects in a cooperative environment. (a) Zoom level 1 (b) Zoom level 2 (c) Zoom level 3 Figure 5: Examples of video sequence of a subject taking from a PTZ camera in an indoor environment. Three different levels of zoom are used to generate views of the subject face at different resolutions. Subject behavior is natural and unscripted. (a) Zoom level 1 (b) Zoom level 2 (c) Zoom level 3 Figure 6: Examples of a subject framed from a PTZ camera in an outdoor environment. As with the indoor sequences, three levels of zoom are used to generate views of the subject face at different resolutions and subject behavior is natural and unscripted. 4 - Comparison with Other Datasets The Florence 2D/3D Hybrid Face Dataset is unique in several aspects. First, we provide multiple, high-quality 3D models, both meshes and textures, plus multiple high-resolution videos of each subject. Also, the image sequences are designed to mimic a range of difficult, real-world application scenarios involving facial analysis. Other substantive differences are summarized in Table 1. Dataset Availability # Subjects Image/Video 3D Model HD Image or Video Annotations Florence 2D/3D Public 50+ No/Yes Mesh+Texture Yes None Texas 3DFRD [5] On Written Request 118 No/No Range+Texture No Fiducial Points FRAV3D [4] Not Publicly Available 106 No/No Range+Mesh+Tex. No None FRGC 2005 [8] On Request 466 Yes/No Range+Texture Yes None GavabDB [6] Public 61 No/No Mesh Only No None Casia3D [2] After Registration 123 No/No Mesh+Texture No None Bosphorus [9] Public 105 No/No 3D Map+Texture No Various BFM [7] On Written Request 200 No/No Mesh+Rendering No Various Table 1: A comparative examination of available datasets supporting research on human faces, especially considering those who provide 2D appearance information like video or images in addition to 3D data. 5 - Release (currently 50+ subjects) The dataset is still under construction, but is already available at the following URL: I Dataset Preview at: http://www.micc.unifi.it/masi/research/ffd/ I Final Dataset at: http://www.micc.unifi.it/datasets/3d-faces We would like to thank Andrea Ferracani for his efforts on the development of the web site making this dataset available to everybody.

Transcript of The Florence 2D/3D Hybrid Face Dataset

Page 1: The Florence 2D/3D Hybrid Face Dataset

The Florence 2D/3D Hybrid Face DatasetAndrew D. Bagdanov, Alberto Del Bimbo, Iacopo Masi

Media Integration and Communication Center, University of Florence, Italy

{bagdanov, delbimbo, masi}@dsi.unifi.it - http://www.micc.unifi.it

Abstract: This poster describes a new dataset collected by the Media Integration and Communication Center at the University of Florence. The dataset consists of:

I High-resolution 3D scans of human faces from many subjects.

I Several video sequences of varying resolution, conditions and zoom level for each subject.

Each subject is recorded in the following situations:

I In a controlled setting in HD video.

I In a less-constrained (but still indoor) setting using a standard, PTZ surveillance camera.

I In an unconstrained, outdoor environment under challenging recording conditions.

This dataset is being constructed specifically to support research on techniques that bridge the gap between 2D, appearance-based recognition techniques, and fully 3D approaches. It isdesigned to simulate, in a controlled fashion, realistic surveillance conditions and to probe the efficacy of exploiting 3D models in real scenarios.

1 - Face as a biometric

Figure 1: Examples of a high resolution mesh.

Human faces are one of the most important biometrics for recognition. Face imagery is easily and non-intrusively collectible, whereas other biometrics such as fingerprints or iris scans are impractical to implementin many scenarios (e.g. a surveillance setting). Because of the universality of faces as a biometric, therehas been a proliferation of face recognition approaches proposed in the research literature [10]. Along withthe proliferation of algorithms for face recognition there has also been an explosion of datasets designed tosupport research in face recognition.

Face recognition remains a difficult problem. Much of this difficulty is due to challenging imaging conditionsand variations caused by expressions, gender and pose. More recently, 3D scanning technology has maturedand the price of entry is much less. This has led to renewed interest in face recognition using 3D models ofhuman faces. One unexplored avenue of research on facial analysis is the potential of using 3D models toaugment the performance of traditional 2D, appearance-based techniques.

2 - Design of the Dataset

Our dataset is designed with two main goals in mind. First, we would like to make available accurate andcomplete 3D models of faces to researchers who are primarily interested in the analysis of 3D meshes andtextures of human faces. That is, our dataset is designed to be useful for research on pure 3D analysistechniques.

We have also designed our dataset to go beyond the scope of 3D analysis techniques, allowing researchersto investigate the possibility of reducing the gap between 2Da computer vision algorithms and thosemethods that work on more precise, 3D models. In particular, our dataset is thought of in the context ofevaluating the use of 3D information in computer vision problem like 3D face pose estimation [3] and 3Dface recognition [1] directly from video data or still images.

To this end, the pipeline of data acquisition is designed to provide both 3D data and 2D videos consistentwith each other.

I First, a 3D model is captured of each subject using a 3D scannerb.

I Second, we record high definition (HD) video of the subject as he simulates a range of specific headrotations (this roughly corresponds to a cooperative environment). Five levels of zoom are used inorder to capture the subject face at multiple image resolutions.

I After this, the subject is then recorded from two PTZ Cameras, one indoor and the other outdoor.These two scenarios represent a more non-cooperative subject and he is asked to be spontaneous.Three levels of zoom are captured in each video in order to cover a broad range of face resolutions.

1) Scanner 3D 2) Cooperative Video

3) PTZ Indoor

(spontaneous, talking, gestures)

4) PTZ Outdoor

(a) Each step of the acquisition.

2) 1 Lateral Model (90°) 3) 1 Lateral Model (-90°)

1) 2 Frontal Models(one for testing and one for training)

4) 1 with glasses(if present)

(b) 3D models acquired after the first step.

Figure 2: Acquisition process and 3D Models.

aWhen we use the term “2D” we refer to video or image sequences of database subjects.bTo capture each 3D model we used a state-of-the-art “3dMDface System” scanner.

3 - The Data

In terms of 3D data, our dataset contains high-resolution 3D models (both meshes and textures).Some example models are shown in figure 3.

(a) Frontal 1 (b) Frontal 2 (c) Left Side (d) Right Side (e) Glasses

Figure 3: Examples of each type of 3D model for five random subjects.

The three types of videos recorded for each subject are defined and shown respectively in figure 4, 5and 6.

(a) Zoom level 1 (b) Zoom level 2 (c) Zoom level 3 (d) Zoom level 4

Figure 4: Samples of the HD video taken of subjects in a cooperative environment.

(a) Zoom level 1 (b) Zoom level 2 (c) Zoom level 3

Figure 5: Examples of video sequence of a subject taking from a PTZ camera in an indoor environment. Three different levels of

zoom are used to generate views of the subject face at different resolutions. Subject behavior is natural and unscripted.

(a) Zoom level 1 (b) Zoom level 2 (c) Zoom level 3

Figure 6: Examples of a subject framed from a PTZ camera in an outdoor environment. As with the indoor sequences, three levels

of zoom are used to generate views of the subject face at different resolutions and subject behavior is natural and unscripted.

4 - Comparison with Other Datasets

The Florence 2D/3D Hybrid Face Dataset is unique in several aspects. First, we provide multiple, high-quality3D models, both meshes and textures, plus multiple high-resolution videos of each subject. Also, the imagesequences are designed to mimic a range of difficult, real-world application scenarios involving facial analysis.Other substantive differences are summarized in Table 1.

Dataset Availability # Subjects Image/Video 3D Model HD Image or Video Annotations

Florence 2D/3D Public 50+ No/Yes Mesh+Texture Yes NoneTexas 3DFRD [5] On Written Request 118 No/No Range+Texture No Fiducial PointsFRAV3D [4] Not Publicly Available 106 No/No Range+Mesh+Tex. No NoneFRGC 2005 [8] On Request 466 Yes/No Range+Texture Yes NoneGavabDB [6] Public 61 No/No Mesh Only No NoneCasia3D [2] After Registration 123 No/No Mesh+Texture No NoneBosphorus [9] Public 105 No/No 3D Map+Texture No VariousBFM [7] On Written Request 200 No/No Mesh+Rendering No Various

Table 1: A comparative examination of available datasets supporting research on human faces, especially considering those whoprovide 2D appearance information like video or images in addition to 3D data.

5 - Release (currently 50+ subjects)The dataset is still under construction, but is already available at the following URL:

I Dataset Preview at:http://www.micc.unifi.it/masi/research/ffd/

I Final Dataset at:http://www.micc.unifi.it/datasets/3d-faces

We would like to thank Andrea Ferracani for his efforts on thedevelopment of the web site making this dataset available to everybody.

Page 2: The Florence 2D/3D Hybrid Face Dataset

The Florence 2D/3D Hybrid Face DatasetAndrew D. Bagdanov, Alberto Del Bimbo, Iacopo Masi

Media Integration and Communication Center, University of Florence, Italy

{bagdanov, delbimbo, masi}@dsi.unifi.it - http://www.micc.unifi.it

[1] S. Berretti, A. Del Bimbo, and P. Pala.3D face recognition using isogeodesic stripes.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(12):2162–2177, December 2010.

[2] Chinese Academy of Sciences’ Institute of Automation (CASIA).Casia-3d facev1.http://biometrics.idealtest.org/.

[3] G. Fanelli, J. Gall, and L. van Gool.Real time head pose estimation with random regression forests.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11), June 2011.

[4] Grupo de Reconocimiento Facial e Vision Artificial.FRAV3D face dataset.http://www.frav.es/databases/FRAV3d/.

[5] S. Gupta, K. R. Castleman, M. K. Markey, and A. C. Bovik.Texas 3D face recognition database.In Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, pages 97–100, 2010.

[6] A. Moreno and y A.Sanchez.Gavabdb: A 3D face database.In Proceedings of the 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, pages 77–82, 2004.

[7] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter.A 3D face model for pose and illumination invariant face recognition.In Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), 2009.

[8] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek.Overview of the face recognition grand challenge.In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 947 – 954 vol. 1, june 2005.

[9] A. Savran, N. Alyuz, H. Dibeklioglu, O. Celiktutan, B. Gokberk, B. Sankur, and L. Akarun.Bosphorus database for 3D face analysis.In Proceedings of the First COST 2101 Workshop on Biometrics and Identity Management (BIOID 2008), 2008.

[10] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld.Face recognition: A literature survey.ACM Computing Surveys, 35:399–458, December 2003.