Depth Camera in Computer Vision and Computer Graphics EXCELENTE

8/3/2019 Depth Camera in Computer Vision and Computer Graphics EXCELENTE

http://slidepdf.com/reader/full/depth-camera-in-computer-vision-and-computer-graphics-excelente 1/12



482 Journal of Frontiers of Computer Science and Technology 计算机科学与探索 2011, 5(6)

made in traditional 3D measuring systems. Recently, significant improvements have been made in order to achieve

low-cost and compact depth camera devices that have the potential to revolutionize many fields of research, including

computer vision, computer graphics and human computer interaction (HCI). These technologies are also starting to

attract many researchers working for academic or commercial purposes. This paper gives an overview of recent

developments in depth camera technology and discusses the current state of the integration of this technology into

various related applications in computer vision and computer graphics.

Key words: depth camera; computer vision; computer graphics

1 Introduction

Acquiring 3D geometric information from real

environments is an essential task for many applications

in computer vision and computer graphics. Numerous

assignments, such as cultural heritage preservation,

augmented reality and human computer interaction,

obviously favors simple and accurate devices for real-

time range image acquisition. Unfortunately, even for

static scenes, there exists no low-priced off-the-shelf

system, which can provide good quality, high resolu-

tion distance information in real time. Laser scanning

techniques, which merely sample a scene row by row

with a single laser device, are rather time-consuming

and therefore infeasible for dynamic scenes. Stereo

vision systems are rather limited: they are known to be

quite fragile in practice (e.g., due to lack of texture).

Being a newly developed distance measuring

hardware, the depth camera technology opens a new

epoch for 3D geometric information acquisition.

Unlike other 3D systems, the depth camera is very

compact and it has already fulfilled most of the above

stated features, such as full range field and high photo

speed, that are desired for real-time distance meas-

urement.

There are two main approaches employed currently

in depth camera technology. The first one is based on

the time-of-flight (ToF) principle, measuring time de-

lay between transmissions of a light pulse. Some solu-

tions utilize modulated, incoherent light with radiofrequency (RF) carrier, then measure the phase shift of

that carrier on the receive side (e.g., the Photonic

Mixer Devices (PMD)[1]

and Swiss Ranger 4000[2]

).

With phase unwrapping algorithms, the maximum

uniqueness range can be increased. The Swiss Ranger

4000 (http://www.mesa-imaging.ch, Fig.1 (a)) has

ranges of 5 or 10 meters, with 176×144 pixels. The

PMD (http://www.pmdtec.com, Fig.1 (b)) can provide

ranges up to 60 meters. On the other hand, the 3DV Inc.

cameras (http://www.3dvsystems.com)[3]

and Canesta

3D cameras (http://www.canesta.com) are range-gatedsystems using Medina’s design

[4], and indirectly mea-

sure the time of flight using a fast shutter technique.

The second approach is based on the light coding,

projecting a known infrared pattern onto the scene and

determining depth based on the pattern’s deformation

captured by an infrared CMOS imager. This driven by

a single-chip custom-silicon solution, e.g., PrimeSensor

(http://www.primesense.com, Fig.1 (c)), can produce

Fig.1 Different types of depth camera

图 1 不同种类深度相机示意图



向学勤等：深度相机在计算机视觉与图形学上的应用研究 483

depth image up to 640×480 pixels with a maximum

throughput of 60 f/s. And recently popular Microsoft

Kinect sensor (http://www.xbox.com/kinect, Fig.1 (d))

also uses light coding for depth measuring.The overview gives a summary on the depth cam-

era measurement principles (Section 1). Sections 2

and 3 discuss sensor calibration issues and basic con-

cepts in terms of image processing and sensor fusion.

Section 4 focuses on applications for geometric recons-

truction, human-oriented applications, and interaction

based on depth cameras. Finally, Section 5 draws a

conclusion and gives a perspective on future work of

depth camera related research and applications.

2 CalibrationDepth cameras use standard optics to focus the

reflected active light onto the chip. Thus, it is important

that classical intrinsic calibration is required to com-

pensate effects like shifted optical centers and lateral

distortion. For depth camera with relatively high reso-

lution, i.e., 176×144, standard calibration techniques[5]

can be used. For low resolution sensors, Beder[6]

has

proposed an optimization approach based on analysis-

by-synthesis.

To evaluate the error of the depth camera, acqui-

sition of reference data (“ground truth”) is a non-trivialtask. Previous approaches use track lines

[7], which

unfortunately need cost intensive experiment. Alterna-

tive techniques use image based approach to estimate

the extrinsic parameters of the sensor with respect to a

reference plane, e.g., a checkerboard[8]

.

Considering the systematic measurement error,

first approach[9]

assumed a linear deviation with re-

spect to the objects distance. Then, this systematic

depth error can be corrected using look-up-tables[10]

or

B-splines[5]

. Since the systematic error behaves quite

similar for different sensor types[11], it was a signifi-cant improvement when Zhu et al.

[10]combined ToF

sensor with passive stereo (See Fig.2) for getting high

accuracy depth maps. Their approach is based on the

observation that ToF sensors have error characteristics

which are complementary to passive stereo.

Unfortunately, the captured range data are typically

contaminated by noise. The noise level of the distance

measurement depends on the amount of incident ac-

tive light. Also, an additional depth error related to the

intensity color is observed[11]

, i.e., object regions with

Fig.2 Multi-sensor calibration in [10]

图 2 文献[10]中多传感器标定示意图

low near-infrared reflectivity (NIR) have a non-zero

mean offset compared to regions with high reflectivity.

In [8] the systematic and the intensity-related errors

were compensated using a bivariate correction func-

tion based on B-splines directly on the distance values,

assuming both effects to be coupled. Alternatively,

Chan et al.[12]

proposed an adaptive multi-lateral filter

that takes into account the inherent noisy nature of

real-time depth data.

Regarding the multiple reflections, the authors in

[13−14] proposed a model for multiple reflections as

well as a technique for correcting the related meas-

urements. It is assumed that the perturbation compo-

nent due to multiple reflections outside and inside the

camera depends on the scene and the camera con-

struction, respectively. Therefore, the spatial spectral

components consist mainly of low spatial frequencies,

which can be compensated using a genuine model of

the signal as being complex with the amplitude and

the distance as modulus and argument. In a word, this

model is useful if an additional light pattern can be

projected on the object.The device manufacturers also attempt to reduce

the motion artifacts, which are mainly caused by the

latency between the individual exposures for the four

phase images. However, the problem remains and

might be solved by motion-compensated integration

of the individual measurements or motion deblurring

method[15]

.

3 Range Image Processing and Multi–Sensor

FusionBefore using the range data from a depth camera,




usually some pre-processing of the input data is re-

quired. In current generation, these sensors provide

noise-contaminated range data of comparably low

image resolution (e.g., only up to 176×144 for SwissRanger 4000). For the purpose of removing outliers

caused by random noise, bilateral filter is typically

used to refine the range data[16]

.

To upsample the resolution of depth camera, most

approaches are based on the main assumption that

depth discontinuities are often related to color changes

in the corresponding color image. In [17], Markov

random field (MRF) was first designed based on the

low resolution depth maps and the high resolution

camera images. Unfortunately, this method gives

promising spatial resolution enhancement only up to

10×. Yang et al.[18]

then presented a method that mod-

els a cost volume of depth probability and iteratively

applies bilateral filter[16]

to refine the cost volume.

Another recent method[2]

utilized exclusively depth

maps, without color image aid: a sequence of low

resolution depth maps of same scene is aligned and

then merged together to obtain a single depth map

with improved resolution. But this method is restricted

to static scenes’ acquisition. Then, we therefore pre-

sented a simple pipeline[19]

to enhance the quality as

well as improve the spatial and depth resolution of

range data in real time (See Fig.3). Similarly, by using

information from one or more additional high resolu-

tion vision cameras, Tian et al.[20]

considered the

problem of upsampling a low resolution depth map

generated by a range camera to provide an accurate

high resolution depth map from the viewpoint of one

of the vision cameras.

Fig.3 Depth camera data denosing

图 3 深度相机数据去噪处理

From a practical point of view, a higher resolution

is need for color than for depth information. Therefore

different combinations of high resolution video cam-

eras and lower resolution depth cameras have been

studied. Many researchers use a binocular combina-

tion of a depth camera with one[16]

or several RGB-

cameras

[21]

to upsample the low resolution ToF datawith high resolution color information. This fixated

sensor combinations make it available to compute the

rigid 3D transformation between the optical centers of

both sensors (external calibration) and intrinsic cam-

era parameters of each sensor. Utilizing this transfor-

mations the 3D points provided by the depth camera

are co-registered with the 2D image, thus color infor-

mation can be assigned to each 3D point.

There are also a number of monocular systems,

which combine a depth camera with a conventional

image sensor. They have the advantage of making data

fusion easier but requiring more sophisticated optics,

hardware and algorithm. The currently released Micro-

soft Kinect is a good example of monocular 2D/3D-

camera aimed at video game. The device features an

RGB camera and depth sensor running proprietary

software, which provides the capabilities of full-body

3D motion capture, facial recognition and voice recog-

nition.

Another research direction investigates on com-

bining depth cameras with classical stereo techniques.

In [22], it has been first shown that a ToF-stereo com-

bination can greatly speed up the stereo algorithm

while helping to manage textureless regions. A global

data fusion algorithm that incorporates the belief

propagation for depth from stereo images and the ToF

depth data was proposed by [10]. They combine both

depth estimates with an MRF to obtain a fused supe-

rior depth map. For those that are interested in more

technical details, please refer to [23] where authors

built a hybrid camera system composed of a stereo-

scopic camera and a time-of-flight depth camera to

generate high-quality and high-resolution video-plus-

depth.A recent technique

[24]for improving the accuracy

of range maps measured by ToF-cameras is based on

the observation that the range map and intensity image

are not independent but are linked by the shading

constraint: If the reflectance properties of the surface

are known, a certain range map implies a correspond-

ing intensity image (See Fig.4). The main limitation of

this method is that it does not cope well with range

discontinuities. But it will be possible overcome by

ignoring any mesh triangle that straddles range dis-

continuities.




Fig.4 3D reconstruction of a human face using shading constraint

图 4 利用阴影约束对人脸进行三维重建

Fig.5 3D reconstruction based on depth camera

图 5 基于深度相机的三维重建

4 Applications of Depth Camera

4.1 Geometry Extraction and 3D Recon-

struction

Depth cameras typically record surroundings at

high photo speed, e.g., up to 30 f/s for Microsoft Ki-

nect. Thus, these sensors are especially well suited for

directly capturing 3D scene geometry in static and

even dynamic environments. A 3D map of the envi-

ronment can be captured by sweeping the depth cam-

era and registering all scene geometry into a consis-

tent reference coordinate system[25]

. Kim et al.[26]

have

proposed an integrated multi-view sensor fusion

approach that combines information from multiple

color cameras and multiple ToF depth sensors. They

first combined multi-view ToF sensor measurements

to obtain a coarse but complete model. Then, the

initial model is refined by means of a probabilistic

multi-view fusion framework, optimizing over an

energy function that aggregates ToF depth sensor infor-

mation with multi-view stereo and silhouette con-

straints. Fig.5 (a) and (b) show a sample acquired with

this kind of approach.

For high quality 3D reconstruction, Fuchs et al.[27]

investigated how well the known 3D geometry of a

cube was reconstructed with ToF sensors information.

Guan et al.[28]

presented a system that combines mul-

tiple ToF cameras with a set of video cameras to

simultaneously reconstruct dynamic 3D objects with

shape-from-silhouettes and range data. After definingsensing models for each type of sensors, they solved

the reconstruction problem robustly by using Bayesian

inference. A probabilistic ad hoc fusion algorithm[29−30]

was then derived in order to obtain relatively high

quality 3D construction result from the information of

both the ToF camera and the stereo-pair. According to

experimental results, this ad hoc fusion algorithm

leaded to a very accurate calibration suitable for the

fusion algorithm, that, in turn, allowed for precise ex-

traction of the depth information. On the other hand,

the low resolution and small field of views of a depthcamera can be merged or aligned together to utilize

additive information among these scenes. Cui et al.[31]

described a method for 3D object scanning by align-

ing depth scans that are taken from around an object

with a time-of-flight camera (See Fig.5 (c)). This new

easy-to-use 3D object scanning approach makes it

applausible in 3D reconstruction.

Also, high quality 3D reconstruction can be

achieved by utilizing a structure from motion (SFM)

approach[32−33]

. The inherent problem of SFM, how-

ever, is that no metric scale is available. This can be




solved by the metric properties of the depth measure-

ments[34]

. Thus, the SFM approach allows to recon-

struct metric scenes with high resolution at interactive

rates, e.g., for 3D map building and navigation

[35]

.Since color and depth can be obtained simultaneously,

free viewpoint rendering is easily incorporated using

depth compensated warping[36]

.We also propose a 3D

reconstruction method for non-rigid object using one

depth camera[37]

, and then extend this method to scan

hairstyle[38]

(See Fig.6).

Fig.6 Hairstyle scanning using one depth camera

图 6 采用单个深度相机进行发型扫描示意图

Simultaneous reconstruction of a scene with wide

field of view and dynamic scene analysis can beaccomplished by jointly combining a depth/color camera

pair on a computer-driven pan-tilt unit and by scan-

ning the environment in a controlled manner. When

scanning the scene, a 3D panorama can be achieved

by stitching both depth and the color images into a

common cylindrical or spherical panorama. Therefore,

from the center point given by the position of the

pan-tilt unit, a 3D environment model can be finally

reconstructed in a preparation phase. Dynamic 3D

scene content like person movements can then be

acquired online by adaptive object tracking with thecamera head

[39].

4.2 Human-Oriented Analysis

A number of human-oriented applications based on

depth cameras have been made in last few years. For

example, ToF camera systems can be successfully

used to detect respiratory motion of human persons[40]

.

Possible samples are emission tomography where res-

piratory motion may be the main reason for image

quality degradation. In such cases, ToF camera sys-

tems can detect the three dimensional, markerless,

real-time respiratory motion with an accuracy of

0.1 mm. Thus, it is clearly competitive with other im-

age based approaches[41]

. A further paper[42]

used ToF

cameras to monitor respiration during sleep and detect

sleep apnea. Currently, ToF cameras were reported in[43] to identify person facial identification from

single-view on real depth images acquired with an

“off-the-shelf” 3D time-of-flight depth camera.

Some medical applications such as cancer treat-

ment require a repositioning of the patient to a previ-

ously defined position. Depth cameras have been used

in such situation to solve the problem by segmenting

the patient body and registering a rigid 3D-3D surface

registration[44]

. Also, in iris capturing scenario, it has

been reported that[45]

, depth sensor (See Fig.7 (a)) was

used in iris deblurring algorithm for less intrusive iris

capture while improving the robustness and nonintru-

siveness for iris capture.

Fig.7 Human-oriented applications using depth camera

图 7 利用深度相机进行面向人的应用案例

Depth cameras are also useful in motion detection.

In [46], Liao et al. first utilized a single depth camera

to reconstruct complete 3D deformable models (e.g.,

human body) over time, provided that most parts of

the models are observed by the camera at least once.

Unlike well-studied structure from motion method,

their approach can tackle time-varying objects de-

forming arbitrarily but predictably. Acting like a touch

sensor, depth cameras were used to touch on a tabletop[47]

.

Automatic detection and pose estimation of humans

is an important task in human computer interaction

(HCI). In [48], Jain and Subramanian presented a

model based approach for detecting and estimating

human pose by fusing depth and RGB color data from

monocular view. A further study was released by Ga-

napathi et al. in [49] where they derive an efficient




Fig.8 Overview of the algorithm in [53]

图 8 文献[53]的算法框架图

filtering algorithm for markerlessly tracking humanpose in real time, using a stream of monocular depth

images (See Fig.7 (b)). The key idea lies in their ap-

proach is to combine an accurate generative model

which is achievable using programmable graphics

hardware with a discriminative model that feeds data

driven evidence about body part locations. Since the

accurate real-time tracking of humans and other

articulated bodies has enticed researchers for many

years, their work opens a new door for the large num-

ber of useful applications. Most recently, Shotton et

al.[50]

proposed a new method to quickly and accu-rately predict 3D positions of body joints from a sin-

gle depth image, using no temporal information. By

breaking the whole skeleton into parts, their system

can run at 200 f/s on consumer hardware (i.e., Micro-

soft Kinect), while achieving state of the art accuracy.

4.3 User Interaction and User Tracking

Depth cameras have an obvious potential for

interactive systems such as alternative input devices,

games, animated avatars etc. In an early literature[51]

,

Oggier et al. have used a ToF-camera to track the hand

and thereby allow for touch-free interaction in a largevirtual interactive screen. Soutschek et al.

[52]then

presented a similar application for a touch-free navi-

gation in a 3D medical visualization.

User interaction often requires image matting op-

eration, since it wants to extract an interesting object

by recovering per-pixel opacity from its background.

More recently, Zhu et al.[53]

proposed an automatic

matting technique by combining a ToF camera with a

stereo. The key idea of their method is to fuse infor-

mation from the ToF sensor and the stereo camera to

jointly optimize depth map and alpha matte iteratively.

Fig.8 shows the overview of their method. For moredetails we refer the reader to [53].

Recently, many works consider the application of

depth camera for user tracking and man-machine-

interaction. Tracking people in a smart room, i.e.

multi-modal environments where the audible and

visible actions of people inside the rooms are recorded

and analyzed automatically, can benefit from the using

of ToF-sensors[22]

. Another different tracking approach

has been discussed in [54]. Here, only one ToF-sensor

is utilized to observe a scene at an oblique angle. As

for tracking non-rigid objects, in particular humanfaces, Cai et al.

[55]proposed a regularized maximum

likelihood deformable model fitting (DMF) algorithm

for 3D face tracking with a commodity depth camera.

They regulated the noisy depth data in the ICP

framework by using a novel l1 regularization scheme.

Fig.9 demonstrates some tracking results using their

algorithm.

Fig.9 Example tracking results using the

algorithm in [55]

图 9 文献[55]算法跟踪结果示意图




based on the reflectance and depth image of a planar

object[J]. International Journal of Intelligent Systems

Technologies and Applications: Issue on Dynamic 3D

Imaging, 2008, 5(3/4): 285−294.

[7] Kahlmann T, Remondino F, Guillaume S. Range imaging

technology: new developments and applications for peo-

ple identification and tracking[J]. Proceedings of SPIE,

2007, 6491: 1−12.

[8] Lindner M, Kolb A. Calibration of the intensity-related

distance error of the PMD ToF-camera[J]. Proceedings of

SPIE, 2007, 6764: 35.

[9] Kuhnert K D, Stommel M. Fusion of stereo camera and

PMD-camera data for real-time suited precise 3D environ-

ment reconstruction[C]//Proceedings of the 2006 IEEE/RSJ

International Conference on Intelligent Robots and Sys-

tems (IROS ’06), Beijing, China, 2007: 4780−4785.

[10] Zhu J J, Wang L, Yang R G, et al. Reliability fusion of

time-of-flight depth and stereo for high quality depth

maps[J]. IEEE Transactions on Pattern Analysis and

Machine Intelligence (TPAMI), 2010.

[11] Rapp H. Experimental and theoretical investigation of

correlating ToF-camera systems[D]. University of Hei-delberg, Germany, 2007.

[12] Chan D, Buisman H, Theobalt C, et al. A noise-aware

filter for real-time depth upsampling[C]//Proceedings of

the Workshop on Multi-camera and Multi-modal Sensor

Fusion Algorithms and Applications (M2SFA2 ’08), Mar-

seille, France, October 12-18, 2008: 1−12.

[13] Falie D, Buzuloiu V. Distance errors correction for the

time of flight (ToF) cameras[C]//Proceedings of European

Conference on Circuits and Systems for Communications

(ECCSC ’08), Bucharest, Romania, July 10-11, 2008:

193−196.

[14] Falie D. 3D image correction for time of flight (ToF)

cameras[J]. Proceedings of SPIE, 2008, 7156: 133.

[15] Tai Y W, Kong N, Lin S, et al. Coded exposure imaging

for projective motion deblurring[C]//Proceedings of the

IEEE International Conference on Computer Vision and

Pattern Recognition (CVPR ’10), San Francisco, USA,

June 13-18, 2010: 1−8.

[16] Huhle B, Schairer T, Jenke P, et al. Robust non-local

denoising of colored depth data[C]//Proceedings of the


Pattern Recognition Workshop on Time of Flight Camera

Based Computer Vision (CVPRW ’08), Anchorage, Alaska,

USA, June 23-28, 2008.

[17] Diebel J, Thrun S. An application of Markov random

fields to range sensing[C]//Proceedings of the Conference

on Neural Information Processing Systems (NIPS ’05),

Vancouver, British Columbia, Canada, December 5-8, 2005.

[18] Yang Q X, Yang R G, Davis J, et al. Spatial-depth super

resolution for range images[C]//Proceedings of the IEEE

International Conference on Computer Vision and Pattern

Recognition (CVPR ’07), Minneapolis, Minnesota, USA,

June 18-23, 2007: 1−8.

[19] Xiang X Q, Li G X, Pan Z G, et al. Real-time spatial and

depth upsampling for range data[J]. LNCS Transactions

on Computational Science, 2011, 6670: 78−98.

[20] Tian C, Vaishampayan V, Zhang Y F. Upsampling range

camera depth maps using high-resolution vision camera

and pixel-level confidence classification[J]. Proceedings

of SPIE, 2011, 7863.[21] Guomundsson S A, Larsen R, Aanaes H, et al. ToF imaging

in smart room environments towards improved people

tracking[C]//Proceedings of the IEEE International Con-

ference on Computer Vision and Pattern Recognition

Workshop on Time of Flight Camera Based Computer

Vision (CVPRW ’08), Anchorage, Alaska, USA, June 23-28,

2008.

[22] Gudmundsson S A, Aanaes H, Larsen R. Fusion of stereo

vision and time-of-flight imaging for improved 3D esti-

mation[J]. International Journal of Intelligent Systems

Technologies and Applications: Issue on Dynamic 3D

Imaging, 2008, 5(3/4): 425−433.

[23] Kim S Y, Koschan A, Mongi A A, et al. Book chapter:

three-dimensional video contents exploitation in depth

camera-based hybrid camera system[M]//Signals and

Communication Technology, High-Quality Visual Ex-

perience. [S.l.]: Springer, 2010: 349−369.

[24] Böhme M, Haker M, Martinetz T, et al. Shading con-




straint improves accuracy of time-of-flight measure-

ments[C]//Proceedings of the IEEE International Conference

on Computer Vision and Pattern Recognition (CVPR ’08),

Anchorage, Alaska, USA, June 24-26, 2008: 1−8.

[25] Huhle B, Jenke P, Strasser W. On-the-fly scene acquisi-

tion with a handy multisensory system[J]. International

Journal of Intelligent Systems Technologies and Applica-

tions: Issue on Dynamic 3D Imaging, 2008, 5(3/4):

255−263.

[26] Kim Y M, Theobalt C, Diebel J, et al. Multi-view image

and ToF sensor fusion for dense 3D reconstruction[C]//

Proceedings of the IEEE Workshop on 3-D Digital Imag-

ing and Modeling (3DIM ’09), Kyoto, Japan, October 3-4,

2009: 1542−1549.

[27] Fuchs S, May S. Calibration and registration for precise

surface reconstruction[C]//Proceedings of the Dynamic

3D Imaging Workshop in Conjunction with DAGM

(Dyn3D), Heidelberg, Germany, September 2007.

[28] Guan L, Franco J S, Pollefeys M. 3D object reconstruc-

tion with heterogeneous sensor data[C]//Proceedings of

the 4th International Symposium on 3D Data Processing,

Visualization and Transmission (3DPVT ’08), Atlanta,

USA, June 18-20, 2008: 295−302.

[29] Mutto C D, Zanuttigh P, Cortelazzo G M. Accurate 3D

reconstruction by stereo and ToF data fusion[C]//Pro-

ceedings of the GTTI Meeting 2010, Brescia, Italy, May 2010.

[30] Mutto C D, Zanuttigh P, Cortelazzo G M. A probabilistic

approach to ToF and stereo data fusion[C]//Proceedings

of the 5th International Symposium on 3D Data Processing,

Visualization and Transmission (3DPVT ’10), Paris,

France, May 17-20, 2010.

[31] Cui Y, Schuon S, Chan D, et al. 3D shape scanning with a

time-of-flight camera[C]//Proceedings of the IEEE Inter-

national Conference on Computer Vision and Pattern Rec-

ognition (CVPR’10), San Francisco, USA, June 13-18,

2010: 1−8. [32] Bartczak B, Koeser K, Woelk F, et al. Extraction of 3D

freeform surfaces as visual landmarks for real-time

tracking[J]. Journal of Real-Time Image Processing, 2007,

2(2/3): 81−101.

[33] Koeser K, Bartczak B, Koch R. Robust GPU-assisted

camera tracking using free-form surface models[J]. Jour-

nal of Real-Time Image Processing, 2007, 2(2/3): 133−147.

[34] Streckel B, Bartczak B, Koch R, et al. Supporting struc-

ture from motion with a 3D range-camera[C]//Lecture

Notes in Computer Science 4522: Proceedings of the 15th

Scandinavian Conference on Image analysis. Berlin,

Heidelberg: Springer-Verlag, 2007: 233−242.

[35] Prusak A, Melnychuk O, Roth H, et al. Pose estimation

and map building with a PMD-camera for robot naviga-

tion[J]. International Journal of Intelligent Systems Tech-

nologies and Applications: Issue on Dynamic 3D Imaging,

2008, 5(3/4): 355−364.

[36] Koch R, Evers-Senne J. View synthesis and rendering

methods[M]//3D Video Communication: Algorithms, Con-

cepts and Real-time Systems in Human Centered Com-

munication. [S.l.]: Wiley, 2005: 151−174.

[37] Tong J, Xiang X Q, Pan Z G, et al. 3D reconstruction of

non-rigid shapes using one ToF camera[J]. Journal of

Computer-Aided Design & Computer Graphics, 2011,

23(3): 377−384.

[38] Tong J, Zhang M M, Xiang X Q, et al. 3D body scanning

with hairstyle using one time-of-flight camera[J]. Journal

of Computer Animation and Virtual Worlds, 2011, 22(2/3):

203−211.

[39] Bartczak B, Schiller I, Beder C, et al. Integration of a

time-of-flight camera into a mixed reality system for han-

dling dynamic scenes, moving viewpoints and occlusions in

real-time[C]//Proceedings of the 4th International Sympo-

sium on 3D Data Processing, Visualization and Transmission

(3DPVT ’08), Atlanta, USA, June 18-20, 2008: 295−302.

[40] Schaller C, Penne J, Hornegger J. Time-of-flight sensor

for respiratory motion gating[J]. Medical Physics, 2008,

35(7): 3090−3093.

[41] Penne J, Schaller C, Hornegger J, et al. Robust real-time

3D respiratory motion detection using time-of-flight

cameras[J]. International Journal of Computer Assisted

Radiology and Surgery, 2008, 3(5): 427−431.

[42] Falie D, Ichim M, David L. Respiratory motion visualiza-

tion and the sleep apnea diagnosis with the time of flight




(ToF) camera[C]//Proceedings of International Confer-

ence on Visualization Imaging and Simulation (VIS ’08),

Bucharest, Romania, November 7-9, 2008: 179−184.

[43] Ding H, Moutarde F, Shaiek A. 3D object recognition and

facial identification using time averaged single-views

from time-of-flight 3D depth-camera[C]//Proceedings of

the Eurographics Workshop on 3D Object Retrieval,

Norrköping, Sweden, May 3-7, 2010.

[44] Adelt A, Schaller C, Penne J. Patient positioning using

3D surface registration[C]//Proceedings of the Russian-

Bavarian Conference on Biomedical Engineering, Mos-

cow, Russia, July 8-9, 2008: 202−207.

[45] Huang X Y, Ren L, Yang R G. Image deblurring for less

intrusive iris capture[C]//Proceedings of the IEEE Inter-

national Conference on Computer Vision and Pattern

Recognition (CVPR ’09), Miami, Florida, USA, June

20-26, 2009: 1558−1565.

[46] Liao M, Zhang Q, Wang H M, et al. Modeling deformable

objects from a single depth camera[C]//Proceedings of the

IEEE International Conference on Computer Vision

(ICCV ’09), Kyoto, Japan, September 27 - October 4,

2009: 167−

174.[47] Wilson A. Using a depth camera as a touch sensor[C]//

Proceedings of the ACM International Conference on In-

teractive Tabletops and Surfaces (ITS ’10), Saarbrucken,

Germany, November 7-10, 2010: 69−72.

[48] Jain H P, Subramanian A. Real-time upper-body human

pose estimation using a depth camera, HPL-2010-190[R].

2010.

[49] Ganapathi V, Plagemann C, Koller D, et al. Real time

motion capture using a single time-of-flight camera[C]//

Proceedings of the IEEE International Conference on

Computer Vision and Pattern Recognition (CVPR ’10),

San Francisco, USA, June 13-18, 2010: 755−762.

[50] Shotton J, Fitzgibbon A, Cook M, et al. Real-time human

pose recognition in parts from single depth images[C]//


Computer Vision and Pattern Recognition (CVPR ’11),

Colorado Springs, USA, June 20-25, 2011.

[51] Oggier T, Büttgen B, Lustenberger F, et al. SwissRanger

SR3000 and first experiences based on miniaturized 3D

ToF cameras[C]//Proceedings of the 1st Range Imaging

Research Day at ETH Zurich, 2005: 97−108.

[52] Soutschek S, Penne J, Hornegger J, et al. 3D gesture-

based scene navigation in medical imaging applications

using time-of-flight cameras[C]//Proceedings of the IEEE

International Conference on Computer Vision and Pattern

Recognition Workshop on Time of Flight Camera Based

Computer Vision (CVPRW ’08), Anchorage, Alaska, USA,

June 23-28, 2008.

[53] Zhu J J, Wang L, Yang R G, et al. Reliability joint depth

and alpha matte optimization via fusion of stereo and

time-of-flight sensor[C]//Proceedings of the IEEE Inter-

national Conference on Computer Vision and Pattern

Recognition (CVPR ’09), Miami, Florida, USA, June

20-26, 2009.

[54] Hansen D W, Hansen M S, Kirschmeyer M, et al. Cluster

tracking with time-of-flight cameras[C]//Proceedings of

the IEEE International Conference on Computer Vision

and Pattern Recognition Workshop on Time of Flight

Camera Based Computer Vision (CVPRW ’08), Anchorage,

Alaska, USA, June 23-28, 2008.[55] Cai Q, Gallup D, Zhang C, et al.3D deformable face

tracking with a commodity depth camera[C]//Proceedings

of the 11th European Conference on Computer Vision

(ECCV ’10), Crete, Greece, September 5-11, 2010: 229−242.

[56] Penne J, Soutschek S, Fedorowicz L, et al. Robust

real-time 3D time-of-flight based gesture navigation[C]//


Automatic Face & Gesture Recognition (FG 2008),

Amsterdam, Netherlands, September 17-19, 2008.

[57] Holte M, Moeslund T. View invariant gesture recognition

using the CSEM SwissRanger SR-2 camera[J]. Interna-

tional Journal of Intelligent Systems Technologies and

Applications: Issue on Dynamic 3D Imaging, 2008,

5(3/4): 295−303.

[58] Holte M, Moeslund T, Fihl P. Fusion of range and inten-

sity information for view invariant gesture recogni-

tion[C]//Proceedings of the IEEE International Confer-

ence on Computer Vision and Pattern Recognition Work-




shop on Time of Flight Camera Based Computer Vision

(CVPRW ’08), Anchorage, Alaska, USA, June 23-28, 2008.

[59] Acharya S, Tracey C, Rafii A. System design of time-

of-flight range camera for car park assist and backup

applications[C]//Proceedings of the IEEE International

Conference on Computer Vision and Pattern Recognition

Workshop on Time of Flight Camera Based Computer Vision

(CVPRW ’08), Anchorage, Alaska, USA, June 23-28, 2008.

[60] Swadzba A, Beuter N, Schmidt J, et al. Tracking objects

in 6D for reconstructing static scene[C]//Proceedings of

the IEEE International Conference on Computer Vision

and Pattern Recognition Workshop on Time of Flight

Camera Based Computer Vision (CVPRW ’08), Anchor-

age, Alaska, USA, June 23-28, 2008.

[61] Ghobadi S E, Loepprich O E, Ahmadov F, et al. Real time

hand based robot control using 2D/3D images[C]//Pro-

ceedings of the International Symposium on Advances in

Visual Computing (ISVC ’08), Las Vegas, Nevada, USA,

December 1-3, 2008: 307−316.

[62] Dolson J, Baek J, Plagemann C, et al. Upsampling range

data in dynamic environments[C]//Proceedings of the


Pattern Recognition (CVPR ’10), San Francisco, USA,

June 13-18, 2010: 1141−1148.

[63] Halit B S, Sonia C. Humanoid robot control using depth

camera[C]//Proceedings of the 6th International Confer-

ence on Human-Robot Interaction (HRI ’11), EPFL,

Lausanne, Switzerland, March 6-9, 2011: 401−402.

XIANG Xueqin was born in 1981. He is a Ph.D. candidate at Zhejiang University. His research interests

include computer vision and depth camera, etc.

向学勤(1981—), 男, 湖南麻阳人, 浙江大学博士研究生, 主要研究领域为计算机视觉, 深度相机等。

PAN Zhigeng was born in 1965. He is a professor and doctoral supervisor at Zhejiang University, and the

senior member of CCF. His research interests include virtual reality, human animation, human-computer

interaction and edutainment, etc.

潘志庚(1965—), 男, 江苏淮安人, 浙江大学研究员、博士生导师, CCF 高级会员, 主要研究领域为虚

拟现实, 人体动画, 人机交互, 教育娱乐等。

TONG Jing was born in 1981. He is a Ph.D. candidate at Zhejiang University. His research interests

include computer graphics and 3D animation, etc.

童晶(1981—), 男, 江苏扬州人, 浙江大学博士研究生, 主要研究领域为计算机图形学, 三维动画等。

Depth Camera in Computer Vision and Computer Graphics EXCELENTE

Documents

Transcript of Depth Camera in Computer Vision and Computer Graphics EXCELENTE