Post on 07-Apr-2018
8/3/2019 Hand-Held 3D Scanning
1/17
Machine Vision and Applications (2011) 22:563579
DOI 10.1007/s00138-010-0248-1
ORIGINAL PAPER
Hand-held 3D scanning based on coarse and fine registrationof multiple range images
Soon-Yong Park Jaewon Baek Jaekyoung Moon
Received: 10 March 2009 / Revised: 25 November 2009 / Accepted: 14 January 2010 / Published online: 9 February 2010
Springer-Verlag 2010
Abstract A hand-held 3D scanning technique is proposed
to reconstruct 3D modelsof real objects. A sequence of rangeimages captured from a hand-held stereo camera is automat-
ically registered to a reference coordinate system. The auto-
mated scanning process consists of two states, coarse and
fine registration. At the beginning, scanning process starts
at the fine registration state. A fast and accurate registra-
tion refinement technique is used to align range images in
a pair-wise manner. If the refinement technique fails, the
process changes to the coarse registration state. A feature-
based coarse registration technique is proposed to find corre-
spondences between the last successful frame and thecurrent
frame. If thecoarse registrationsuccesses, theprocess returns
to the fine registration state again. A fast point-to-plane
refinement technique is employed to do shape-based reg-
istration. After the shape-based alignment, a texture-based
refinement technique matches texture features to enhance
visual appearance of the reconstructed models. Through a
graphic and video display, a human operator adjusts the pose
of the camera to change the view of the next acquisition.
Experimental results show that 3D models of real objects are
reconstructed from sequences of range images.
S.-Y. Park (B)
Department of Computer Engineering,
Kyungpook National University, Daegu, 702-701, Korea
e-mail: sypark@knu.ac.kr
J. Baek
NeoMtel Incorporation, Seoul, 135-080, Korea
J. Moon
School of Electronics, Electrical and Computer Science,
Kyungpook National University, Daegu, 702-701, Korea
Keywords Hand-held 3D scanning
Multi-view range images Registration refinement Coarse registration
1 Introduction
Three dimensional (3D) range data of an object can be
acquired from several sensing techniques such as laser rang-
ing, structured light, stereo vision, and so on. Due to the
recent advances in such sensing techniques, multi-view 3D
reconstruction of real objects is of very interests in com-
puter vision and computer graphics researches. To recon-
struct a complete 3D model of an object, multiple range
images should be acquired from different viewing directions
to obtain the partial shapes of the object. Then, the range
images need to be merged to obtain the 3D model. In com-
mon, multi-view 3D reconstruction means a complete pro-
cess of 3D model generation from multiple range data.
Multi-view reconstruction, however, is a very time con-
suming task. One reason is the acquisition time of multiple
range images. Even though some real-time 3D acquisition
systems are introduced nowadays, general 3D ranging tech-
niques require a lot of time to obtain multiple range images.
For example, the laser ranging technique requires many
images of laser stripes even for a single view reconstruc-
tion. Therefore, reconstruction of a large scene usually takes
several hours. Similarly the structured light technique needs
a couple of or many pictures of specially designed patterns
to obtain a single range image. The stereo vision technique
has inherent matching problems which are complex and time
consuming tasks to solve.
Another reason is the alignment problem of multi-view
range images. Each range imagecapturedby a range sensoris
represented by an independent coordinate system regardless
123
8/3/2019 Hand-Held 3D Scanning
2/17
564 S.-Y. Park et al.
of sensor pose. Therefore, range images of different coor-
dinate systems should be aligned with respect to a com-
mon coordinate system before being merged into a single
3D model. This process is called registration. Sometimes,
the initial poses of range images are known if they are cap-
tured by a calibrated range sensor. However, the initial poses
should be refined to reconstruct accurate 3D models. Reg-
istration refinement is a very important task in multi-view3D reconstruction. Accuracy of a reconstructed 3D model
is directly affected by the accuracy of refinement. With this
reasons, many registration refinement techniques have been
investigated in recent years.
Besl and Mckay [2] propose the idea of ICP (Iterative
Closest Point) technique. Registration refinement is mini-
mizing the transformation error between the matching con-
jugates between different range images (Actually they are
rangesurfaces). ICPassumestheclosestpoints between them
as the matching conjugates. Based on the ICP technique,
various extensions are investigated. Some of them are as fol-
lows. Johnson and Kang [7] modify the ICP technique tocombine color information in registration. Levoy et al. [9]
introduce a voxel-based registration technique. Their tech-
nique needs triangulation of range images to measure signed
distance from a voxel to overlapping range images. Huber
andHebert [5] introducea graph-based registration technique
which can work without initial poses of range images. Ur-
falioglu et al. [22] use a global optimization technique for
uncalibrated registration. ICP can be used for large scale city
modeling. Akbarzadeh et al. [1] combine INS and GPS data
to register multi-view images for city-modeling.
Recently, there are some investigations to reduce the time
of multi-view 3D reconstruction based on on-line or interac-
tive methods. Those methods are categorized by two scan-
ning types. The first scanning type uses a hand-held sensor
to scan a fixed object. The other type uses a fixed sensor to
scan a hand-held object. Figure 1 shows two different types
Fig. 1 Registration problem of a hand-held 3D sensor
of on-line scanning. In the left of the figure, an object is held
by hand and a fixed sensor is used to capture range images.
Suppose the object in hand is rotated by a small angle and
the two range images are obtained by the sensor before and
after rotation. Then, the two poses of the object in the images
are close enough to run a refinement algorithm to align the
images. This is due to the fixed camera coordinate system.
On the contrary, suppose a hand-held sensor is rotated bythe same angle as shown in the right of the figure, then the
displacement of the object in the two range images becomes
much larger than that of the fixed sensor due to the rotation
of the camera coordinate system.
Because each range image of a hand-held sensor is repre-
sented by an independent camera coordinate system, unsta-
ble scanning motion can cause registration failure. Figure 2
shows an example. Here, we scan an object with the same
distance and direction, but different scanning speed. In
Fig. 2a, registration of range images fails, thus the figure
shows incorrect position and orientation of the sensor (small
white dots are camera centers and short red lines are theirorientations). With slower scanning speed, aligned range
images and correct camera poses are obtained as shown in
Fig. 2b.
On-line 3D modeling using a hand-held sensor has inher-
ent problems as mentioned in the previous paragraphs. In
this reason, only a few investigations address the on-line 3D
scanning or modeling problems. Liu and Heidrich [10] pro-
pose a stereo-based on-line registration system based on real
time processing hardware. Jaeggli and Koninckx [6] acquire
multi-view range images using a pattern projector and reg-
ister them on-line. Their approach uses a fixed sensor and
a turn table to rotate an object. Therefore, initial poses of
range images are known in advance. Popescu et al. [15,16]
propose a real-time modeling system using a scanning rig
which consists of calibrated laser dots and a video camera.
The 3D orientation of projected laser dots are recorded and
registered in about five frames per second. Their approach
reconstructs 3D surface models by triangulation of sparse
3D points. Rusinkiewicz et al. [17] propose a real-time reg-
istration system using a pattern projector and a video cam-
era. A pattern projector is used to project coded patterns to
a hand-held object. 3D point clouds of the object are regis-
tered in near real-time using a point-to-projection refinement
technique. Se and Jasiobedzki [19] introduce a 3D modeling
system using a hand-held stereo camera. They use texture
features to register a sequence of range images for crime
scene reconstruction. Yun et el. [24] uses a hand-held ste-
reo camera to acquire range images of indoor scenes. Their
off-line process of 3D reconstruction and model registration
uses the well-known SIFT algorithm. Hilton and Illingworth
[4] use a laser sensor attached at the end of an articulated
robotic arm. Thearm hassixdegree-of-freedom andthe pose
of the sensor is measured with respect to a global coordinate
123
8/3/2019 Hand-Held 3D Scanning
3/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 565
Fig. 2 a Erroneous registration,
b successful registration
system. Therefore, their approach do not need to consider
the registration problem. Matabosch et. al. [12] propose an
on-line registration technique which minimizes propagationerror which is inherent in pair-wise registration. They use a
point-to-plane registration technique which is similar to that
used in our method.
In this paper, wepropose anon-line 3Dscanningtechnique
which is based on a hand-held stereo sensor. A sequence of
range images captured from the sensor is registered on-line.
The on-line scanning process consists of two states, refine-
ment andcoarse registration.Theprocessbegins at therefine-
ment state. A registration refinement technique continuously
registers range images in a pair-wise manner. To overcome
refinement failure due to unstable hand motion, a fast coarse
registration technique is proposed. If the refinement tech-
nique fails, scanning process changes to the coarse registra-
tion state. At this state, we match shape features between two
different range images, and register them using the matching
results. To reduce matching time, we sample depth edges as
shape features. Matching process is done in a hierarchical
manner for fast processing.
The coarse registration estimates the initial pose between
a pair of range images so that the refinement technique can
resume. Once the initial pose is close enough, the refinement
step begins to register subsequent frames again. The refine-
ment technique uses both 3D shape and texture informa-
tion for accurate 3D model reconstruction. Sampled texture
features by the KLT tracker are used in both shape-based
and texture-based registration [21]. Using a graphic and
video display, a human operator can see registered range
images and plan the next view interactively. Such interac-
tion enhances the 3D scanning performance.
Experimental results show thattheproposedtechniquecan
register a long sequence of range images. Using the stereo
sensor, we register range images at 1.2 frames per second.
Error analysis of 3D reconstruction results in 1.2mm aver-
age error with 1.8 mm standard deviation. Also, 3D models
of real indoor and outdoor objects are shown.
2 System overview
Figure 3 shows a flow diagram of the proposed 3D scanning
technique. Acquisition of multi-view range images is done
by a stereo vision camera, BumbleBeefrom Pointgrey incor-
poration [26]. Both range and texture images are acquired
simultaneously from the camera. To separate foreground
objects from the background, we remove some portions of
range images, which are farther or closer than the working
range from 0.3 to 2m. A range image captured from the
camera is registered to the previously aligned range image.
Therefore, all acquired range images are registered in a pair-
wise manner. The first frame from the camera becomes the
reference frame and the others are registered to the reference
frame.
For pair-wise registration, 3D points from two range
images are sampled. The samples are chosen by the fea-
ture selection routine of the KLT tracker [20,21]. Because
that the KLT features are sampled usually from high contrast
textures, they are good features to track in a 2D sequence
of small motion. Therefore, later in the modeling process,
we combine them with shape-based registration refinement
technique.
To use 3D features associated to the 2D features, we
remove such KLT features which have no depth values.
Because background ranges are removed beforehand, 2D
features in the background are removed too. 3D features on
the range images are registered iteratively by shape-based
registration followed by texture-based registration. From the
given 3D features in the current range image, their corre-
spondences are determined by a point-to-plane registration
technique [13]. The transformation between the current and
123
8/3/2019 Hand-Held 3D Scanning
4/17
566 S.-Y. Park et al.
Fig. 3 Flow diagram of the
proposed 3D registration
technique
Range/Te
Foregro
Fine registration
ReYes
< 10
Texture-based registration
Registr
Coarse m
Refinem
Enoug
< 10
No
No
/Texture acquisition
ground extraction
Refinement No
Coarse registration
< 10
Start
istration ErrorCoarse mode = true
Refinement = false
e mode = false
nement = true
ugh frames ?
No. of Control point< 10
End
Yes
Yes
Depth edge sampling
Depth edge matching
2D/3D feature selection
Point-to-plane registration
No. of Control point
the previous range images is derived by the least-square
manner.
After shape-based registration, a modified KLT tracker
refines the registration result again. 2D features are used
in the texture-based registration step. Owing to the shape-
based registration refinement, projections of the 2D fea-
tures in one texture image are close to their correspondences
in the other image. Thus, the processing time of feature
matching is reduced. We modified the original KLT track-
ing algorithm to make the correspondence search fast and
accurate [14].
The 3D scanning scheme consists of two states. As shown
in Fig. 4, the scanning scheme starts at the fine registra-
tion state. In each registration refinement step, we measure
registration error to determine if the alignment is success-
ful or not. If the current registration is determined to fail,
we change the state of scanning to the coarse registration
state. Once a human operator fail to register a range image,
based on our several experiments, it is difficult for him to
align the sensor very close to the last successful pose. Even
though a graphical user interface is displayed on-line during
acquisition, using only a refinement algorithm is not enough
to recover the failure. Instead, we introduce a coarse reg-
istration technique. After successful coarse registration, we
resumeregistrationrefinement. Thescanningprocess is done
continuously until enough range images are obtained and
registered.
Coarse
registration
Pairwise
Refinement
Success
start
Success
Fail
Fail
stop
Registration error
Fig. 4 State diagram of the proposed 3D scanning process
123
8/3/2019 Hand-Held 3D Scanning
5/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 567
3 Registration refinement
3.1 Shape-based range image refinement
In this section, we describe a shape-based registration refine-
ment technique. In general, there are three main categories of
shape-based registration techniques. Figure 5 shows simple
diagrams of them. In Point-to-point registration, the conju-gate of a control point p on the source surface is determined
as the closest point q on the destination surface. An error
metric ds is the distance between two control points.
Point-to-plane registration is another common technique.
It searches the intersection on the destination surface from
thenormal vector of thesource point.As shownin theFig.5b,
the destination control point q is the projection ofp onto the
tangent plane at q which is the intersection from the normal
ofp. Point-to-projection approach is known to be a fast reg-
istration technique. As shown in the Fig. 5c, this approach
determines a point q which is the conjugate of a source point
p, by forward-projecting p from the point of view of the des-tination OQ. In order to determine the projection point, p is
first backward-projected to a 2D point pQ on the range image
plane of the destination surface, and then pQ is forward-pro-
jected to the destination surface to get q. This algorithm is
very fast because it does not include any searching step to
find the correspondence. However, one of its disadvantages
is that the result of registration is not as accurate as those of
the others [17].
In this paper, we usea point-to-plane techniquecalled IPP
(iterative-projection-point) method for shape-based registra-
tion [13]. This technique combines with the point-to-pro-
jection technique to reduce processing time. Let us briefly
explain theIPP method. To align two surfaces S andD shown
in Fig. 6, we need to find correspondences between the two
surfaces. For example, P0 in S corresponds to Q in surfaceD
which is the intersection of the normal vector at P0. First, we
project P0 in ID, the 2D image ofQ surface and fine coordi-
nate pD. Then we can find point Qr using range image ofD
surface that corresponds to the coordinate ofpD. A 3D point
P1 is found from the projection Qr point to the normal of
P0. Applying this iteratively n times, we can find Q which
is the convergence point ofPn.Q is then determined by the
pq
ds
ds
ds
(a)
p
q
q'
(b)
p
q
OP
OQ
(c)
Fig. 5 Three catagories of registration techniques. a Point-to-point,
b point-to-plane, c point-to-projection
Q
Q
p0p0
Qr
D
v
p1
ID
Tangent planeS
pD
p
Fig. 6 Iterative projection point algorithm
projection ofP0 to the tangent plane ofQ. Using K corre-
sponding pairsPk0 and Qk,we findthetransformation matrix
T = [R|t] that minimizes a registration error between two
point sets using Eq. (1). We use the SVD (singular value
decomposition) method to solve the equation.
=
Kk=1
Qk (RPk0 + t)2. (1)
This process calculates the Euclidean transformation
matrix T between two surfaces repeatedly. Registration error
is measured using rotation R and translation t of the trans-
formation. We findthat registration error after about 30 itera-
tionsdoes notdecreasein common. Ingeneral, sourcecontrol
pointset P canbe selectedbysamplingthesource surface ran-
domly or uniformly, andfiltered bysome constraints todelete
unreliable control points. Selection of IPP control points are
described in the next section.
3.2 Selection of IPP control points
To runIPP, some 3Dpointsmustbesampled ascontrol points.
Sampling control points usually affects registration perfor-
mance[18]. In this paper, a couple of constraints are applied
to select the control points as follows.
Texture features of theKLT tracker areused as thecontrol
points.
Invalid features are removed if they are in thebackground
or have no 3D correspondences. If the distance between
3D correspondences is too long, they are regarded as
invalid correspondences.
As shown in the left of the Fig. 7, only 2D features in
the object area are used in both shape-based and texture-
based refinement steps.
In the right of the figure, the normalvectorof eachpoint is
compared with that of theviewpoint to remove unreliable
123
8/3/2019 Hand-Held 3D Scanning
6/17
568 S.-Y. Park et al.
Fig. 7 Feature point selection. Reliable KLT features are sampled and
out of range points are removed
Sa Sb Sc Sd Sa Sb Sc Sd
Sn
ii
(a)
Sa Sb Sc Sd Sa Sb Sc Sd
i
i
Tab
(b)
Fig. 8 a Pair-wise registration problem, b initialization using the pre-
vious transformation
features. We remove a control point if the dot product of
two vectors is greater than 0.5.
3.3 Pair-wise registration
When consecutive range images are registered to a refer-
ence coordinate system, an initial registration error need to
be small. Fig. 8a shows a simple case. After the shape Sbregistered to Sa, the next shape Sc should be registered to
Sb which is aligned already. However, there is a large initial
errorbetween Sb and Sc,which makes it difficult for shape Scto be aligned to Sb. If the initial registration error i between
two range images is large, the refinement step is subject to
fail to align them.
A simple solution of this problem is transforming the cur-
rent range image based on theprevious alignmentresult. This
bringsthe current range image to thecoordinatesof theprevi-
ous range image, which is already in the common coordinate
system. In Fig. 8b, the transformation Tab whichbrings shape
Sb to Sa is applied to surface Sc to initialize registration.
3.4 Texture-based registration method
Texture-base registration techniques employ 2D feature cor-
respondences to estimate the 3D pose between range images
pSqD
rS(pS)rD(qD)
pS
ISID
large search area
TS
(a)
pSqD
rS(pS)rD(qD)
ps
small search area
ISID
(b)
Fig. 9 Comparison of KLT search range a before and b after shape-
based registration
[3,23,24]. In case that there is large pose difference, tracking
of 2D texture features is not easy due to the wide baseline
between views. In consequence, tracking time increases and
success rate also decreases. One may filter out unsuccess-
ful features by using a RANSAC algorithm, however it also
take extra time for pose estimation. For accurate tracking, the
size of search region and tracking windows must be traded
off with tracking performance and time.
In this paper, we improve the performance of correspon-
dence tracking by reducing search range. The original KLT
tracking algorithm is modified by employing the results of
shape-based registration. In the original KLT tracker, the
coordinates of the starting point of correspondence search
in the current image are always the same with those in the
previous images. In Fig. 9a, a feature point pS in image plane
Is is shown also in the other image plane ID. The original
KLT tracker starts search its correspondence qD, from the
coordinates ofpS. Therefore, existing KLT tracker needs a
wide search area to find the correspondence if images are
obtained from much different viewing directions.
Figure 9b shows a modified KLT method used in our
experiments. When therange of a 2D feature pS is denoted as
rS(pS), we can compute pS from the projection ofrS(pS) to
123
8/3/2019 Hand-Held 3D Scanning
7/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 569
Fig. 10 Feature tracking
comparison a modified KLT
b original KLT
ID. When the range image r s is registered by transformation
TS, it is projected to image ID by multiplying the perspective
projection matrix M. The projection matrix is obtained from
the calibrated camera in advance.
pS = MTSrs(pS) (2)
When the shape-based registration result is considered, the
starting coordinates of the KLT tracker become very close to
its correspondence. This however assumes the shape-based
registration has small registration error. In most case, regis-
tration result is accurate enough to bring the 2D correspon-
dences in very close distance. Thus, using a small search
region, we cantracktexture features very fast to further refine
the pose.
Figure 10 shows an example of texture feature tracking.
In the figure, the green dots represent texture features and the
white lines represent tracking results between two consecu-
tive images. Using a smaller tracking window than that ofthe original KLT, almost all features are successfully tracked
in the modified KLT. Because most features are success-
fully tracked, we can estimate the transformation without
any RANSAC algorithm.
4 Handling registration failure
Hand-held 3D scanning is a convenient way to reconstruct
3D models of real objects. However, one drawback of this
method is that the view of the 3D sensor is controlled by
hand. One may think that hand motion is very controllable
by a human operator. However, in terms of 3D registration,
a very small motion of the hand yields large displacement
between consecutive images. With this reason, conventional
3D scanning is done by on-the-shelf systems.
To reconstruct 3D shapes of real objects, we need to
acquire range and texture imagesas many as possible.There-
fore, it is needed to cope with the hand motion during scan-
ning. During theregistrationrefinementstate,we check if the
alignment of the current image is successful or not. If not,
we move the state to coarse registration. In the coarse reg-
istration state, a range image is captured and matched with
the previously aligned range image. If it is not successful,
the capturing and matching steps are done repeatedly until
it is successful. If the coarse registration is successful, the
scanning state returns to refinement step.Decision of coarse registration is done by measuring the
pose error between the correspondences. Let Tn1 and Tnbe the transformation matrix of the (n 1)th and nth range
images. Then the transformation
Tn1,n = Tn1T1n (3)
can be considered as an error between them. To measure the
registration errorbetween (n 1)th and nth range images, we
need to multiply Tn1 and T1n to the corresponding range
images, and compute rotation and translation error between
them. From matrixTn1,n = [ri j |ti ], rotation and translation
errors are computed as follows.
R =
2i =0(Iii rii )
2
3, (4)
t =
Kk=1
Qk P k0
K, (5)
where k is the number of correspondences between images.
Rotation error is measured by the difference between the
identity matrix and the rotation matrix. Rather than using
the translation vector in Tn1,n, we use the average distance
between the correspondences as the measure of translation.
As mentionedin an earlier section, we provide a human oper-
ator a graphic user interface to plan the view of the camera
and check the status of on-line scanning.
5 Coarse registration
5.1 Sampled depth-edge block matching
In this section, we propose a coarse registration technique to
match two wide baseline range images. We call our coarse
123
8/3/2019 Hand-Held 3D Scanning
8/17
570 S.-Y. Park et al.
Fig. 11 Matching strategy of
SDBM
Fig. 12 Sampling of depth
edge points. From left, original
depth edge, uniform sampling,
and complete depth points
Fig. 13 An example of SDBM.
a Matched depth points,
b before registration, c after
registration
registration method SDBM (Sample Depth edge Block
Matching). Suppose there are two range images, source and
destination as in Fig. 11. The source range image can be
considered as the current image and similarly the destination
image as the last range image aligned already in thereference
coordinate system. From the source image, we pick some
shape features which are originally on the edge of the range
image. Edge features in the range image are chosen because
they are independent of texture change. Edge features can
be acquired directly from the range image. In this paper, we
apply the Sobel filter to find edge features. A fixed threshold
value is used to determine edge features. Determining the
123
8/3/2019 Hand-Held 3D Scanning
9/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 571
Fig. 14 Ten test pairs of coarse
registration. From left,
destination and source frames,
Initial poses, depth and features
of the two frames. From top,
BT1, BT2, SB1, SB2,SM1,
SM2, SC1, SC2, SL1, and SL2
threshold value is not so critical in this case because the edge
features are sampled later to reduce the number of features.
Given a sample point si in the source, we find a matching
point in the destination range image. To find the matching
point, we define two search regions, R1 and R2. A square
region R1 is defined by placing its center to the same coor-
dinate with that ofsi . In R1, let one of the depth edge points
be di . Another square region R2 is defined similarly with
123
8/3/2019 Hand-Held 3D Scanning
10/17
572 S.-Y. Park et al.
Fig. 15 Results of coarse
registration. From left to right
Point clouds and textured results
of KLT, SPIN, and SDBM. The
point clouds show the poses
before applying IPP, but the
textured results show after
applying IPP
its center di . In R2, a matching window WD is defined at
each point di
and its cost is measured with respect to WS,
the matching window of si . The second search region R2is defined at every destination points in R1. Therefore, the
matching pair (si ,di ) is determined to yield the least cost
value between WS and WD.
Let nw, nR2, and nR1 be the number of depth points in
the matching window, the number valid points in R2, and
123
8/3/2019 Hand-Held 3D Scanning
11/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 573
Table 1 Comparison of registration error ( mm)
Object Method Initial Coarse Coarse + IPP S(uccess)/F(ail)
BT1 KLT 48.3 19.1 0.9 S
SPIN 63.7 42.3 1.3 S
SDBM 40.3 10.6 1.2 S
BT2 KLT 43.7 30.9 0.8 S
SPIN 74.5 64.8 1.1 S
SDBM 37.1 2.7 0.7 S
SB1 KLT 7.9 F
SPIN 115.4 96.9 2.6 S
SDBM 54.2 43.8 3.9 S
SB2 KLT 48.4 25.9 1.5 S
SPIN 84.9 79.3 3.1 S
SDBM 48.5 33.9 2.8 S
SM1 KLT 55.5 21.6 0.5 S
SPIN 62.8 40.5 2.7 S
SDBM 57.3 6.6 0.5 S
SM2 KLT 87.8 59
.2 5
.9 S
SPIN 138.8 130.9 130.9 F
SDBM 72.5 64.8 6.9 F
SC1 KLT 42.4 8.7 2.6 S
SPIN 54.1 24.3 2.2 S
SDBM 42.4 0.9 0.8 S
SC2 KLT 24.4 8.7 0.7 S
SPIN 44.7 43.7 2.6 F
SDBM 24.1 1.2 0.7 S
SL1 KLT 28.8 14.4 4.3 F
SPIN 38.2 10.0 0.4 S
SDBM 38.1 0.8 0.3 S
SL2 KLT 30.6 17.9 3.4 F
SPIN 103.0 88.2 4.0 F
SDBM 61.1 6.2 0.6 S
the number of depth edge points in R1, respectively. Then
computational complexity of finding the matching point is
O(nw nR2 nR1). Cost C(si , di ) between the two match-
ingwindows is measuredby themean-normalizedSSD (Sum
of squared difference) as follows:
WS =1
nW
i,jWS
rS(i, j), WD =1
nW
i,j WD
rD(i, j), (6)
C(si , di ) =
1
nW
i,jWS
((rS(i, j) WS) (rD(i, j) WD))2,
(7)
where rS(i, j) and rD(i, j) are the depth at pixel (i, j) of the
matching window.
Depthedge pointsused in thematchingalgorithm aresam-
pled from the original range images. As mentioned before,
only depth edge points areused to facilitate high curvature of
Fig. 16 Graphicuser interfaceto assista user adjusting thesensorpose
the edge. To make the matching process fast and reliable, we
reject some depth edge points as follows. First depth edge
points are sampled as shown in Fig. 12. In the left image,
red-colored (dark-grey) points are edge points sampled from
a range image.Second,weuniformlysamplesagain to reduce
the numberof edgepoints as shown in the center in the figure.
Finally, we remove some incomplete points. Here, we regard
a point as incomplete if any neighborhood in the matching
window W is a hole.
We do the depth points sampling in both range images.
The local matching point in R2 is determined by
local(di ) = argminj
cost
si , dj R2
. (8)
The final matching point is a global minimum ofsi , which
can be computed as
global(si ) = argminj
cost(local(dj ) R1). (9)
Even though a destination point is found to be the global
minimum, its cost C(si , di ) should be less than a thresh-
old value. If not, we reject such point. Figure 13 shows
an example of coarse matching. In Fig.13a, solid lines
show matching pairs of features between two range images.
In Fig.13b and c, the two 3D shapes displayed together
to compare before and after coarse registration. In this
case, 10 matching pairs are used for registration. Match-
ing window size is 15 15 and the size of R1 and R2are 100 and 50, respectively. Computing the transformation
matrix from matching pairs is done by the same method
with that of fine registration. Registration errors before and
after registration are 38.54 and 3.58 mm, respectively. Initial
123
8/3/2019 Hand-Held 3D Scanning
12/17
574 S.-Y. Park et al.
Fig. 17 Results of Beethoven
a original images b range
images, c selected features
d tracked features, e camera
orientations, fReconstructed
model
registration error is reduced much enough to start refine-
ment.
5.2 Comparison of coarse matching
The proposed coarse registration technique can be consid-
ered to be a 3D shape matching technique. To match 3D
shapes obtained from different views, other conventional
shape matching techniques can be considered. Spin image
is one of the examples. By the way, conventional 2D match-
ing techniques can be used also because we have texture
images associated to every range images. In this section, we
compare theperformance of our3D matching techniquewith
other techniques, Spin image [8] and KLT [21].
Spin image is a 3D shape matching technique. Spin image
isa 2Dspaceof and which are mapping of 3Dmeasure of
123
8/3/2019 Hand-Held 3D Scanning
13/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 575
Fig. 18 Results of
Sacheonwang a original
images, b range images,
c selected features, d tracked
features, e camera orientations,
freconstructed model
a point with respect to neighboring surface points. To run the
Spin image technique, the image size is set to 4040 and the
length of each bin of the image is set to 8 mm, which covers
160 mm from the measure point. To run KLT, the original
KLT algorithm is used and the size of matching window is
set to 15 15, as the same with that of our matching block
W. For pair comparison, we use the same number of features
which are extracted from SDBM.
Five objects are used for this comparison as shown in
Fig.14. Two test frames shown in the first and the second
columns of the figure are sampled from the video sequence
of the object. Corresponding range images are shown in the
123
8/3/2019 Hand-Held 3D Scanning
14/17
576 S.-Y. Park et al.
Fig. 19 Results of Natural
scene a original images,
b range images, c selected
features, d tracked features,
e camera orientations,
freconstructed model
last two columns. The test range images are sampled so that
the initial pose becomes too wide to register them by our
refinement algorithm. In the range images, depth features
extracted by the SDBM are overlaid. In the third column,
two range images are overlapped to show the initial pose
between two frames.
Figure 15 shows coarse registration results. The first three
columns show theresultsof KLT, SPIN,andSDBM.The sec-
ondthreecolumnsshow thesameresultswith texturedpoints.
KLT andSDBM show very similar results while SPIN some-
times yields erroneous results. The main reason is that SPIN
hasmoredegreeof freedom than KLT andSDBMdueto rota-
tion andscale invariance. Thefirst three columns of thefigure
show thepose right after applying SDBM technique. The last
three columns show textured range images but after apply-
ing IPP which is followed by SDBM. Table 1 shows registra-
tion error measured between tworange imagesafter applying
each matching technique. In the table, decision of success
123
8/3/2019 Hand-Held 3D Scanning
15/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 577
and fail is given by visual inspection of results. Registration
error is measured as the average distance between matching
pairs.
6 Experimental results
The stereo vision camera generates 15 range images per sec-ond. The resolution of a range image is 320 240. The pro-
posed 3D scanning technique is applied to two real objects
andone natural scene.A computerofa Pentium 3.4GHzCPU
isused. It takesabout800ms to registerper rangeimagewhen
both shape-based and texture-based registration refinements
are applied. In case that the shape-based registration is used
only, about 600 ms is needed.
6.1 Online graphic interaction
To register range images on-line, it is better for a human
operator to view the status of scanning through a graphicdisplay. Then the operator can adjust the view of the
next image frames. During 3D scanning, we provide a
user interaction based on graphic models and image dis-
play.
The graphic interaction system presents the user the cur-
rent registration status by rendering all range images. The
system also shows the position and orientation of each cam-
era displaying a graphic box and a line as in Fig. 16. Through
the display, the user can get a visual feedback to adjust the
speed of camera motion and the direction of camera. When
a range image fails, the user can adjust the position and ori-
entation of the camera to resume the registration process. In
addition if there are holes in the registered range images, the
user can acquire new range images to fill the hole. At one of
the corners of the display, two images are shown in real time.
The images are the current texture image and the previous
texture image. The graphical user interface is developed by
OpenGL.
6.2 Reconstruction results
The first 3D reconstruction experiment is performed using
a plaster model of Beethoven. Total 40 range images are
acquired continuously. In this experiment, the sensor rotates
around the object in about 90. The object is placed in front
of a random-dot background and the background ranges
are removed before registration. Fig. 17 shows some recon-
struction results. From Fig. 17ad show input images, range
images of the object areas, selected features, and feature
motions. Frame number 0, 13, 29, and 39 are shown. In
Fig. 17e, registered point clouds are shown with the all cam-
era positions. In Fig. 17fand g, the front view of the regis-
tered range images and an integration result are shown. To
Table 2 Registration and integration time (s/frame)
Object Registration Integration
20 frames 40 frames 60 frames
Beetoven 0.75 330 705 N.A.
Sachenwang 0.87 220 N.A. 650
Natural scene 0.91 390 1105 N.A.
Table 3 Average registration error
Object Registration error
Translation ( mm) Rotation
Beetoven 0.75 0.00052
Sachenwang 0.82 0.00028
Natural scene 4.55 0.00032
integrate registered range images into a 3D mesh model, wethe Marching Cubes algorithm [11]. For this reconstruction,
the voxel size is set to 3mm, and total 309,520 triangles are
generated.
Figure 18 shows results of another object called
Sacheonwang. The object is in the museum of our cam-
pus and its height is about 1m. An operator holds the range
camera in hand and moves it around the object. Because
the hand shaking, it was not easy to take continuous range
images without error, however total 60 range images are
acquired and registered. Due to the illumination condition of
the museum, there are more range errors than Beethoven,
however experimental results show its 3D model is reason-ably reconstructed. Figure 19 shows experimental results of
Natural scene. Due to the inherent noises in natural objects,
some parts of reconstruction show blur patterns. However,
reconstructionof 40range imagesyields thenatural 3Dshape
of the scene.
Table 2 presents registration and integration time of the
three experiments. Registration of a pair of range images
takes less than 1 s in average. It is acceptable speed for online
registration because an operator can move a range camera
carefully to acquire accurate range images. Table 3 shows
average registration and rotation error. The translation and
rotation errors are measured as explained in Sect. 4. Thetable shows that registration error is very small after the reg-
istration refinement. Translation error of Natural scene is
a lot high compared to the others. This is due to the noisy
background of the scene.
6.3 Reconstruction error analysis
To analyze the reconstruction error of our 3D scanning,
a reconstructed model of Beethoven is compared with a
123
8/3/2019 Hand-Held 3D Scanning
16/17
578 S.-Y. Park et al.
Fig. 20 Reconstruction error
analysis a ground truth model of
Beethoven, b registration of
the reconstructed model(Green,
light grey) to the ground truth
ground truth model. To generate the ground truth model, a
NextEngine desktop 3D scanner is used to scan the same
object. The scanner is based on the laser-ranging technique.
Figure 20a shows the reconstructed ground truth model of
Beethoven.
Error analysis is done as follows. First, the reconstructed
3D model from our method is overlapped with the ground
truth model by manually. Second, using a simple ICP tech-
nique, we register our model to the ground truth. We use
the ICP for refinement because the ground truth model does
not have any image plane required to run IPP. Figure 20b
shows registered models. Third, by uniformly sampling the
3D points in our model, we measure the distance to the clos-
est point in the ground truth. In result, there is about 1.2mm
average error and 1.8mm standard deviation.
7 Conclusion
This paper proposesa new hand-held3D scanning technique.
Until today, not many investigations address the problem
of hand-held 3D scanning. Due to the unstable motion of
a human hand and the processing time of pose estimation,
conventional 3D shape registration or matching techniques
may fail to automatically align a sequence of range images.
In this paper, we combine fineandcoarse registration of mul-
tiple range images to overcome the problem. A sequence of
range images obtained by a stereo vision sensor is regis-
tered automatically to reconstruct 3D models of real objects.
A fast registration refinement technique aligns continuous
range images in a pair-wise manner. If the refinement step
fails, a coarse registration techniquefinds the initial pose of a
wide baseline range images to resume the refinement step. A
graphic interface displaying the status of registration on-line
helps a human operator to plan the next view of the sensor.
Using the proposed technique, we show the 3D reconstruc-
tion results of three real objects.
In this paper, we have shown only partial reconstruction
results. For complete 3D reconstruction, closing the surfaces
of an object is needed. Currently, we need to walk around an
object to scan all visible surfaces of the object. However, it
is still a difficult problem to scan and register all surfaces on-
line while walking around the object. Two main diffifulties
are sensorvibrationdue to human gait anderror propagation.
In the future, we will consider the complete 3D modeling
problem using a hand-held 3D sensor.
Acknowledgements Thiswork wassupported by theKoreaResearch
Foundation Grant funded by the KoreanGovernment. (KRF-2007-331-
D00423).
References
1. Akbarzadeh, A., Frahm, J.M., Mordohai, P., Clipp, B., Engels, C.,
Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L.,
Yang, Q., Stewenius, H., Yang, R., Welch, G., Towles, H., Nister
D., Pollefeys, M.: Towards urban 3D reconstruction from video.
In: Proceedings of 3DPVT06 (2006)
2. Besl, P.J., McKay, N.D.: A Method for Registration of 3-D
Shapes. IEEE Trans. Pattern Recogn. Mach. Intell. 14(2), 239
256 (1992)
3. Dias, P., Sequeira, V., Vaz, F., Goncalves, J.G.M.: Registration and
fusion of intensity and range data for 3D modelling of real world
123
8/3/2019 Hand-Held 3D Scanning
17/17
Hand-held 3D scanning based on coarse and fine registration of multiple range images 579
scenes. In:FourthInternational Conferenceon3-DDigital Imaging
and Modeling, pp. 418421 (2003)
4. Hilton, A., Illingworth, J.: Geometric fusion for a hand-held 3d
sensor. Mach. Vis. Appl. 12(1), 4451 (2000)
5. Huber, D., Hebert, M.: 3-D modeling using a statistical sensor
model and stochastic search. In: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR),
pp. 858865 (2003)
6. Jaeggli, T., Koninckx, T.P., Van Gool, L.: Online 3D acquisition
and model integration, IEEE international workshop on projector-
camera systemsICCV03, cdrom proc (2003)
7. Johnson,A.E., Kang, S.B.:Registrationand integration of textured
3D data. Image Vis. Comput. 17(2), 135147 (1999)
8. Johnson A.: Spin-images: a representation for 3-D surface match-
ing, CMU-RI-TR-97-47 (1997)
9. Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D.,
Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J.,
Shade, J., Fulk D.: The digital michelangelo project: 3D scanning
of large statues. In: SIGGRAPH, pp. 131144 (2000)
10. Liu, Y., Heidrich, W.: Interactive 3D model acquisition and regis-
tration. In: Proceedings of 11th Pacific Conference on Computer
Graphics and Applications, pp 115122 (2003)
11. Lorensen, W.E., Cline Harvey, E.: Marching cubes: a high res-
olution 3D surface construction algorithm. ACM SIGGRAPH
Comput. Graph. 21(4), 163169 (1987)
12. Matabosch,C.,Fofi, D.,Salvi, J.,Batlle,E.:Registrationof surfaces
minimizing error propagation for a one-shot multi-slit hand-held
scanner. Pattern Recogn. 41(6), 20552067 (2008)
13. Park, S.Y., Subbarao, M.:An accurateand fastpoint-to-planeregis-
tration technique. Pattern Recogn. Lett. 24(16), 29672976 (2003)
14. Park,S.Y., Baek, J.: Online registrationof multi-viewrange images
using geometricand photometricfeaturetracking. In:The 6thInter-
nationalConferenceon 3-DDigital ImagingandModeling (3DIM)
(2007)
15. Popescu, V., Sacks, E., Bahmutov, G.: The model camera: a hand-
held device for interactive modeling. In: Proceedings of 3DIM03,
pp. 285292 (2003)
16. Popescu, V., Sacks, E., Bahmutov, G.: Interactive modeling from
dense color and sparse depth. In: Symposium on 3D Data Process-
ing, Visualization, and Transmission (3DPVT) (2004)
17. Rusinkiewicz, S., Hall-Holt, O., Levoy, M.: Real-time 3d model
acquisition. Proc. Siggraph 21(3), 438446 (2002)
18. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algo-
rithm. In: Third international conference on 3-D digital imaging
and modeling, pp. 145152 (2001)
19. Se,S., Jasiobedzki,P.: Instantscene modelerfor crime scene recon-
struction. IEEE Conf. Comput. Vis. Pattern Recogn. 3, 123123
(2005)
20. Shi, J. Tomasi, C.: Good features to track. IEEE Conf. Comput.
Vis. Pattern Recogn. 593600 (1994)
21. Tomasi, C., Kanade, T.: Detection and tracking of point features,
Carnegie Mellon University Technical Report CMU-CS-91-132
(1991)
22. Urfalioglu, O., Mikulastik, P., Stegmann, I.: Scale invariant robust
registration of 3D-point data and a triangle mesh by global opti-
mization. In: Proceedings of the 8th International Conference on
Advanced Concepts for Intelligent Vision Systems (ACIVS 2006),
LNCS, vol. 127, pp. 10591070 (2006)
23. Yoshida,K., SaitoH.: Registrationof range images using textureof
high-resolution color images. In: Proceedings of IAPR Workshop
on Machine Vision Applications(MVA2002) (2002)
24. Yun S.U., Min, D., Sohn, K.: 3D scene reconstruction system with
hand-held stereo camerasm 3DTV conference, pp. 14 (2007)
25. http://www.ces.clemson.edu/~stb/klt/
26. http://www.ptgrey.com
13
http://www.ces.clemson.edu/~stb/klt/http://www.ptgrey.com/http://www.ptgrey.com/http://www.ces.clemson.edu/~stb/klt/