Hand-Held 3D Scanning

8/3/2019 Hand-Held 3D Scanning

1/17

Machine Vision and Applications (2011) 22:563579

DOI 10.1007/s00138-010-0248-1

ORIGINAL PAPER

Hand-held 3D scanning based on coarse and fine registrationof multiple range images

Soon-Yong Park Jaewon Baek Jaekyoung Moon

Received: 10 March 2009 / Revised: 25 November 2009 / Accepted: 14 January 2010 / Published online: 9 February 2010

Springer-Verlag 2010

Abstract A hand-held 3D scanning technique is proposed

to reconstruct 3D modelsof real objects. A sequence of rangeimages captured from a hand-held stereo camera is automat-

ically registered to a reference coordinate system. The auto-

mated scanning process consists of two states, coarse and

fine registration. At the beginning, scanning process starts

at the fine registration state. A fast and accurate registra-

tion refinement technique is used to align range images in

a pair-wise manner. If the refinement technique fails, the

process changes to the coarse registration state. A feature-

based coarse registration technique is proposed to find corre-

spondences between the last successful frame and thecurrent

frame. If thecoarse registrationsuccesses, theprocess returns

to the fine registration state again. A fast point-to-plane

refinement technique is employed to do shape-based reg-

istration. After the shape-based alignment, a texture-based

refinement technique matches texture features to enhance

visual appearance of the reconstructed models. Through a

graphic and video display, a human operator adjusts the pose

of the camera to change the view of the next acquisition.

Experimental results show that 3D models of real objects are

reconstructed from sequences of range images.

S.-Y. Park (B)

Department of Computer Engineering,

Kyungpook National University, Daegu, 702-701, Korea

e-mail: [email protected]

J. Baek

NeoMtel Incorporation, Seoul, 135-080, Korea

J. Moon

School of Electronics, Electrical and Computer Science,

Kyungpook National University, Daegu, 702-701, Korea

Keywords Hand-held 3D scanning

Multi-view range images Registration refinement Coarse registration

1 Introduction

Three dimensional (3D) range data of an object can be

acquired from several sensing techniques such as laser rang-

ing, structured light, stereo vision, and so on. Due to the

recent advances in such sensing techniques, multi-view 3D

reconstruction of real objects is of very interests in com-

puter vision and computer graphics researches. To recon-

struct a complete 3D model of an object, multiple range

images should be acquired from different viewing directions

to obtain the partial shapes of the object. Then, the range

images need to be merged to obtain the 3D model. In com-

mon, multi-view 3D reconstruction means a complete pro-

cess of 3D model generation from multiple range data.

Multi-view reconstruction, however, is a very time con-

suming task. One reason is the acquisition time of multiple

range images. Even though some real-time 3D acquisition

systems are introduced nowadays, general 3D ranging tech-

niques require a lot of time to obtain multiple range images.

For example, the laser ranging technique requires many

images of laser stripes even for a single view reconstruc-

tion. Therefore, reconstruction of a large scene usually takes

several hours. Similarly the structured light technique needs

a couple of or many pictures of specially designed patterns

to obtain a single range image. The stereo vision technique

has inherent matching problems which are complex and time

consuming tasks to solve.

Another reason is the alignment problem of multi-view

range images. Each range imagecapturedby a range sensoris

represented by an independent coordinate system regardless

123


2/17

564 S.-Y. Park et al.

of sensor pose. Therefore, range images of different coor-

dinate systems should be aligned with respect to a com-

mon coordinate system before being merged into a single

3D model. This process is called registration. Sometimes,

the initial poses of range images are known if they are cap-

tured by a calibrated range sensor. However, the initial poses

should be refined to reconstruct accurate 3D models. Reg-

istration refinement is a very important task in multi-view3D reconstruction. Accuracy of a reconstructed 3D model

is directly affected by the accuracy of refinement. With this

reasons, many registration refinement techniques have been

investigated in recent years.

Besl and Mckay [2] propose the idea of ICP (Iterative

Closest Point) technique. Registration refinement is mini-

mizing the transformation error between the matching con-

jugates between different range images (Actually they are

rangesurfaces). ICPassumestheclosestpoints between them

as the matching conjugates. Based on the ICP technique,

various extensions are investigated. Some of them are as fol-

lows. Johnson and Kang [7] modify the ICP technique tocombine color information in registration. Levoy et al. [9]

introduce a voxel-based registration technique. Their tech-

nique needs triangulation of range images to measure signed

distance from a voxel to overlapping range images. Huber

andHebert [5] introducea graph-based registration technique

which can work without initial poses of range images. Ur-

falioglu et al. [22] use a global optimization technique for

uncalibrated registration. ICP can be used for large scale city

modeling. Akbarzadeh et al. [1] combine INS and GPS data

to register multi-view images for city-modeling.

Recently, there are some investigations to reduce the time

of multi-view 3D reconstruction based on on-line or interac-

tive methods. Those methods are categorized by two scan-

ning types. The first scanning type uses a hand-held sensor

to scan a fixed object. The other type uses a fixed sensor to

scan a hand-held object. Figure 1 shows two different types

Fig. 1 Registration problem of a hand-held 3D sensor

of on-line scanning. In the left of the figure, an object is held

by hand and a fixed sensor is used to capture range images.

Suppose the object in hand is rotated by a small angle and

the two range images are obtained by the sensor before and

after rotation. Then, the two poses of the object in the images

are close enough to run a refinement algorithm to align the

images. This is due to the fixed camera coordinate system.

On the contrary, suppose a hand-held sensor is rotated bythe same angle as shown in the right of the figure, then the

displacement of the object in the two range images becomes

much larger than that of the fixed sensor due to the rotation

of the camera coordinate system.

Because each range image of a hand-held sensor is repre-

sented by an independent camera coordinate system, unsta-

ble scanning motion can cause registration failure. Figure 2

shows an example. Here, we scan an object with the same

distance and direction, but different scanning speed. In

Fig. 2a, registration of range images fails, thus the figure

shows incorrect position and orientation of the sensor (small

white dots are camera centers and short red lines are theirorientations). With slower scanning speed, aligned range

images and correct camera poses are obtained as shown in

Fig. 2b.

On-line 3D modeling using a hand-held sensor has inher-

ent problems as mentioned in the previous paragraphs. In

this reason, only a few investigations address the on-line 3D

scanning or modeling problems. Liu and Heidrich [10] pro-

pose a stereo-based on-line registration system based on real

time processing hardware. Jaeggli and Koninckx [6] acquire

multi-view range images using a pattern projector and reg-

ister them on-line. Their approach uses a fixed sensor and

a turn table to rotate an object. Therefore, initial poses of

range images are known in advance. Popescu et al. [15,16]

propose a real-time modeling system using a scanning rig

which consists of calibrated laser dots and a video camera.

The 3D orientation of projected laser dots are recorded and

registered in about five frames per second. Their approach

reconstructs 3D surface models by triangulation of sparse

3D points. Rusinkiewicz et al. [17] propose a real-time reg-

istration system using a pattern projector and a video cam-

era. A pattern projector is used to project coded patterns to

a hand-held object. 3D point clouds of the object are regis-

tered in near real-time using a point-to-projection refinement

technique. Se and Jasiobedzki [19] introduce a 3D modeling

system using a hand-held stereo camera. They use texture

features to register a sequence of range images for crime

scene reconstruction. Yun et el. [24] uses a hand-held ste-

reo camera to acquire range images of indoor scenes. Their

off-line process of 3D reconstruction and model registration

uses the well-known SIFT algorithm. Hilton and Illingworth

[4] use a laser sensor attached at the end of an articulated

robotic arm. Thearm hassixdegree-of-freedom andthe pose

of the sensor is measured with respect to a global coordinate

123


3/17

Hand-held 3D scanning based on coarse and fine registration of multiple range images 565

Fig. 2 a Erroneous registration,

b successful registration

system. Therefore, their approach do not need to consider

the registration problem. Matabosch et. al. [12] propose an

on-line registration technique which minimizes propagationerror which is inherent in pair-wise registration. They use a

point-to-plane registration technique which is similar to that

used in our method.

In this paper, wepropose anon-line 3Dscanningtechnique

which is based on a hand-held stereo sensor. A sequence of

range images captured from the sensor is registered on-line.

The on-line scanning process consists of two states, refine-

ment andcoarse registration.Theprocessbegins at therefine-

ment state. A registration refinement technique continuously

registers range images in a pair-wise manner. To overcome

refinement failure due to unstable hand motion, a fast coarse

registration technique is proposed. If the refinement tech-

nique fails, scanning process changes to the coarse registra-

tion state. At this state, we match shape features between two

different range images, and register them using the matching

results. To reduce matching time, we sample depth edges as

shape features. Matching process is done in a hierarchical

manner for fast processing.

The coarse registration estimates the initial pose between

a pair of range images so that the refinement technique can

resume. Once the initial pose is close enough, the refinement

step begins to register subsequent frames again. The refine-

ment technique uses both 3D shape and texture informa-

tion for accurate 3D model reconstruction. Sampled texture

features by the KLT tracker are used in both shape-based

and texture-based registration [21]. Using a graphic and

video display, a human operator can see registered range

images and plan the next view interactively. Such interac-

tion enhances the 3D scanning performance.

Experimental results show thattheproposedtechniquecan

register a long sequence of range images. Using the stereo

sensor, we register range images at 1.2 frames per second.

Error analysis of 3D reconstruction results in 1.2mm aver-

age error with 1.8 mm standard deviation. Also, 3D models

of real indoor and outdoor objects are shown.

2 System overview

Figure 3 shows a flow diagram of the proposed 3D scanning

technique. Acquisition of multi-view range images is done

by a stereo vision camera, BumbleBeefrom Pointgrey incor-

poration [26]. Both range and texture images are acquired

simultaneously from the camera. To separate foreground

objects from the background, we remove some portions of

range images, which are farther or closer than the working

range from 0.3 to 2m. A range image captured from the

camera is registered to the previously aligned range image.

Therefore, all acquired range images are registered in a pair-

wise manner. The first frame from the camera becomes the

reference frame and the others are registered to the reference

frame.

For pair-wise registration, 3D points from two range

images are sampled. The samples are chosen by the fea-

ture selection routine of the KLT tracker [20,21]. Because

that the KLT features are sampled usually from high contrast

textures, they are good features to track in a 2D sequence

of small motion. Therefore, later in the modeling process,

we combine them with shape-based registration refinement

technique.

To use 3D features associated to the 2D features, we

remove such KLT features which have no depth values.

Because background ranges are removed beforehand, 2D

features in the background are removed too. 3D features on

the range images are registered iteratively by shape-based

registration followed by texture-based registration. From the

given 3D features in the current range image, their corre-

spondences are determined by a point-to-plane registration

technique [13]. The transformation between the current and

123


4/17


Fig. 3 Flow diagram of the

proposed 3D registration

technique

Range/Te

Foregro

Fine registration

ReYes

< 10

Texture-based registration

Registr

Coarse m

Refinem

Enoug

< 10

No

No

/Texture acquisition

ground extraction

Refinement No

Coarse registration

< 10

Start

istration ErrorCoarse mode = true

Refinement = false

e mode = false

nement = true

ugh frames ?

No. of Control point< 10

End

Yes

Yes

Depth edge sampling

Depth edge matching

2D/3D feature selection

Point-to-plane registration

No. of Control point

the previous range images is derived by the least-square

manner.

After shape-based registration, a modified KLT tracker

refines the registration result again. 2D features are used

in the texture-based registration step. Owing to the shape-

based registration refinement, projections of the 2D fea-

tures in one texture image are close to their correspondences

in the other image. Thus, the processing time of feature

matching is reduced. We modified the original KLT track-

ing algorithm to make the correspondence search fast and

accurate [14].

The 3D scanning scheme consists of two states. As shown

in Fig. 4, the scanning scheme starts at the fine registra-

tion state. In each registration refinement step, we measure

registration error to determine if the alignment is success-

ful or not. If the current registration is determined to fail,

we change the state of scanning to the coarse registration

state. Once a human operator fail to register a range image,

based on our several experiments, it is difficult for him to

align the sensor very close to the last successful pose. Even

though a graphical user interface is displayed on-line during

acquisition, using only a refinement algorithm is not enough

to recover the failure. Instead, we introduce a coarse reg-

istration technique. After successful coarse registration, we

resumeregistrationrefinement. Thescanningprocess is done

continuously until enough range images are obtained and

registered.

Coarse

registration

Pairwise

Refinement

Success

start

Success

Fail

Fail

stop

Registration error

Fig. 4 State diagram of the proposed 3D scanning process

123


5/17


3 Registration refinement

3.1 Shape-based range image refinement

In this section, we describe a shape-based registration refine-

ment technique. In general, there are three main categories of

shape-based registration techniques. Figure 5 shows simple

diagrams of them. In Point-to-point registration, the conju-gate of a control point p on the source surface is determined

as the closest point q on the destination surface. An error

metric ds is the distance between two control points.

Point-to-plane registration is another common technique.

It searches the intersection on the destination surface from

thenormal vector of thesource point.As shownin theFig.5b,

the destination control point q is the projection ofp onto the

tangent plane at q which is the intersection from the normal

ofp. Point-to-projection approach is known to be a fast reg-

istration technique. As shown in the Fig. 5c, this approach

determines a point q which is the conjugate of a source point

p, by forward-projecting p from the point of view of the des-tination OQ. In order to determine the projection point, p is

first backward-projected to a 2D point pQ on the range image

plane of the destination surface, and then pQ is forward-pro-

jected to the destination surface to get q. This algorithm is

very fast because it does not include any searching step to

find the correspondence. However, one of its disadvantages

is that the result of registration is not as accurate as those of

the others [17].

In this paper, we usea point-to-plane techniquecalled IPP

(iterative-projection-point) method for shape-based registra-

tion [13]. This technique combines with the point-to-pro-

jection technique to reduce processing time. Let us briefly

explain theIPP method. To align two surfaces S andD shown

in Fig. 6, we need to find correspondences between the two

surfaces. For example, P0 in S corresponds to Q in surfaceD

which is the intersection of the normal vector at P0. First, we

project P0 in ID, the 2D image ofQ surface and fine coordi-

nate pD. Then we can find point Qr using range image ofD

surface that corresponds to the coordinate ofpD. A 3D point

P1 is found from the projection Qr point to the normal of

P0. Applying this iteratively n times, we can find Q which

is the convergence point ofPn.Q is then determined by the

pq

ds

ds

ds

(a)

p

q

q'

(b)

p

q

OP

OQ

(c)

Fig. 5 Three catagories of registration techniques. a Point-to-point,

b point-to-plane, c point-to-projection

Q

Q

p0p0

Qr

D

v

p1

ID

Tangent planeS

pD

p

Fig. 6 Iterative projection point algorithm

projection ofP0 to the tangent plane ofQ. Using K corre-

sponding pairsPk0 and Qk,we findthetransformation matrix

T = [R|t] that minimizes a registration error between two

point sets using Eq. (1). We use the SVD (singular value

decomposition) method to solve the equation.

=

Kk=1

Qk (RPk0 + t)2. (1)

This process calculates the Euclidean transformation

matrix T between two surfaces repeatedly. Registration error

is measured using rotation R and translation t of the trans-

formation. We findthat registration error after about 30 itera-

tionsdoes notdecreasein common. Ingeneral, sourcecontrol

pointset P canbe selectedbysamplingthesource surface ran-

domly or uniformly, andfiltered bysome constraints todelete

unreliable control points. Selection of IPP control points are

described in the next section.

3.2 Selection of IPP control points

To runIPP, some 3Dpointsmustbesampled ascontrol points.

Sampling control points usually affects registration perfor-

mance[18]. In this paper, a couple of constraints are applied

to select the control points as follows.

Texture features of theKLT tracker areused as thecontrol

points.

Invalid features are removed if they are in thebackground

or have no 3D correspondences. If the distance between

3D correspondences is too long, they are regarded as

invalid correspondences.

As shown in the left of the Fig. 7, only 2D features in

the object area are used in both shape-based and texture-

based refinement steps.

In the right of the figure, the normalvectorof eachpoint is

compared with that of theviewpoint to remove unreliable

123


6/17


Fig. 7 Feature point selection. Reliable KLT features are sampled and

out of range points are removed

Sa Sb Sc Sd Sa Sb Sc Sd

Sn

ii

(a)

Sa Sb Sc Sd Sa Sb Sc Sd

i

i

Tab

(b)

Fig. 8 a Pair-wise registration problem, b initialization using the pre-

vious transformation

features. We remove a control point if the dot product of

two vectors is greater than 0.5.

3.3 Pair-wise registration

When consecutive range images are registered to a refer-

ence coordinate system, an initial registration error need to

be small. Fig. 8a shows a simple case. After the shape Sbregistered to Sa, the next shape Sc should be registered to

Sb which is aligned already. However, there is a large initial

errorbetween Sb and Sc,which makes it difficult for shape Scto be aligned to Sb. If the initial registration error i between

two range images is large, the refinement step is subject to

fail to align them.

A simple solution of this problem is transforming the cur-

rent range image based on theprevious alignmentresult. This

bringsthe current range image to thecoordinatesof theprevi-

ous range image, which is already in the common coordinate

system. In Fig. 8b, the transformation Tab whichbrings shape

Sb to Sa is applied to surface Sc to initialize registration.

3.4 Texture-based registration method

Texture-base registration techniques employ 2D feature cor-

respondences to estimate the 3D pose between range images

pSqD

rS(pS)rD(qD)

pS

ISID

large search area

TS

(a)

pSqD

rS(pS)rD(qD)

ps

small search area

ISID

(b)

Fig. 9 Comparison of KLT search range a before and b after shape-

based registration

[3,23,24]. In case that there is large pose difference, tracking

of 2D texture features is not easy due to the wide baseline

between views. In consequence, tracking time increases and

success rate also decreases. One may filter out unsuccess-

ful features by using a RANSAC algorithm, however it also

take extra time for pose estimation. For accurate tracking, the

size of search region and tracking windows must be traded

off with tracking performance and time.

In this paper, we improve the performance of correspon-

dence tracking by reducing search range. The original KLT

tracking algorithm is modified by employing the results of

shape-based registration. In the original KLT tracker, the

coordinates of the starting point of correspondence search

in the current image are always the same with those in the

previous images. In Fig. 9a, a feature point pS in image plane

Is is shown also in the other image plane ID. The original

KLT tracker starts search its correspondence qD, from the

coordinates ofpS. Therefore, existing KLT tracker needs a

wide search area to find the correspondence if images are

obtained from much different viewing directions.

Figure 9b shows a modified KLT method used in our

experiments. When therange of a 2D feature pS is denoted as

rS(pS), we can compute pS from the projection ofrS(pS) to

123


7/17


Fig. 10 Feature tracking

comparison a modified KLT

b original KLT

ID. When the range image r s is registered by transformation

TS, it is projected to image ID by multiplying the perspective

projection matrix M. The projection matrix is obtained from

the calibrated camera in advance.

pS = MTSrs(pS) (2)

When the shape-based registration result is considered, the

starting coordinates of the KLT tracker become very close to

its correspondence. This however assumes the shape-based

registration has small registration error. In most case, regis-

tration result is accurate enough to bring the 2D correspon-

dences in very close distance. Thus, using a small search

region, we cantracktexture features very fast to further refine

the pose.

Figure 10 shows an example of texture feature tracking.

In the figure, the green dots represent texture features and the

white lines represent tracking results between two consecu-

tive images. Using a smaller tracking window than that ofthe original KLT, almost all features are successfully tracked

in the modified KLT. Because most features are success-

fully tracked, we can estimate the transformation without

any RANSAC algorithm.

4 Handling registration failure

Hand-held 3D scanning is a convenient way to reconstruct

3D models of real objects. However, one drawback of this

method is that the view of the 3D sensor is controlled by

hand. One may think that hand motion is very controllable

by a human operator. However, in terms of 3D registration,

a very small motion of the hand yields large displacement

between consecutive images. With this reason, conventional

3D scanning is done by on-the-shelf systems.

To reconstruct 3D shapes of real objects, we need to

acquire range and texture imagesas many as possible.There-

fore, it is needed to cope with the hand motion during scan-

ning. During theregistrationrefinementstate,we check if the

alignment of the current image is successful or not. If not,

we move the state to coarse registration. In the coarse reg-

istration state, a range image is captured and matched with

the previously aligned range image. If it is not successful,

the capturing and matching steps are done repeatedly until

it is successful. If the coarse registration is successful, the

scanning state returns to refinement step.Decision of coarse registration is done by measuring the

pose error between the correspondences. Let Tn1 and Tnbe the transformation matrix of the (n 1)th and nth range

images. Then the transformation

Tn1,n = Tn1T1n (3)

can be considered as an error between them. To measure the

registration errorbetween (n 1)th and nth range images, we

need to multiply Tn1 and T1n to the corresponding range

images, and compute rotation and translation error between

them. From matrixTn1,n = [ri j |ti ], rotation and translation

errors are computed as follows.

R =

2i =0(Iii rii )

2

3, (4)

t =

Kk=1

Qk P k0

K, (5)

where k is the number of correspondences between images.

Rotation error is measured by the difference between the

identity matrix and the rotation matrix. Rather than using

the translation vector in Tn1,n, we use the average distance

between the correspondences as the measure of translation.

As mentionedin an earlier section, we provide a human oper-

ator a graphic user interface to plan the view of the camera

and check the status of on-line scanning.

5 Coarse registration

5.1 Sampled depth-edge block matching

In this section, we propose a coarse registration technique to

match two wide baseline range images. We call our coarse

123


8/17


Fig. 11 Matching strategy of

SDBM

Fig. 12 Sampling of depth

edge points. From left, original

depth edge, uniform sampling,

and complete depth points

Fig. 13 An example of SDBM.

a Matched depth points,

b before registration, c after

registration

registration method SDBM (Sample Depth edge Block

Matching). Suppose there are two range images, source and

destination as in Fig. 11. The source range image can be

considered as the current image and similarly the destination

image as the last range image aligned already in thereference

coordinate system. From the source image, we pick some

shape features which are originally on the edge of the range

image. Edge features in the range image are chosen because

they are independent of texture change. Edge features can

be acquired directly from the range image. In this paper, we

apply the Sobel filter to find edge features. A fixed threshold

value is used to determine edge features. Determining the

123


9/17


Fig. 14 Ten test pairs of coarse

registration. From left,

destination and source frames,

Initial poses, depth and features

of the two frames. From top,

BT1, BT2, SB1, SB2,SM1,

SM2, SC1, SC2, SL1, and SL2

threshold value is not so critical in this case because the edge

features are sampled later to reduce the number of features.

Given a sample point si in the source, we find a matching

point in the destination range image. To find the matching

point, we define two search regions, R1 and R2. A square

region R1 is defined by placing its center to the same coor-

dinate with that ofsi . In R1, let one of the depth edge points

be di . Another square region R2 is defined similarly with

123


10/17


Fig. 15 Results of coarse

registration. From left to right

Point clouds and textured results

of KLT, SPIN, and SDBM. The

point clouds show the poses

before applying IPP, but the

textured results show after

applying IPP

its center di . In R2, a matching window WD is defined at

each point di

and its cost is measured with respect to WS,

the matching window of si . The second search region R2is defined at every destination points in R1. Therefore, the

matching pair (si ,di ) is determined to yield the least cost

value between WS and WD.

Let nw, nR2, and nR1 be the number of depth points in

the matching window, the number valid points in R2, and

123


11/17


Table 1 Comparison of registration error ( mm)

Object Method Initial Coarse Coarse + IPP S(uccess)/F(ail)

BT1 KLT 48.3 19.1 0.9 S

SPIN 63.7 42.3 1.3 S

SDBM 40.3 10.6 1.2 S

BT2 KLT 43.7 30.9 0.8 S

SPIN 74.5 64.8 1.1 S

SDBM 37.1 2.7 0.7 S

SB1 KLT 7.9 F

SPIN 115.4 96.9 2.6 S

SDBM 54.2 43.8 3.9 S

SB2 KLT 48.4 25.9 1.5 S

SPIN 84.9 79.3 3.1 S

SDBM 48.5 33.9 2.8 S

SM1 KLT 55.5 21.6 0.5 S

SPIN 62.8 40.5 2.7 S

SDBM 57.3 6.6 0.5 S

SM2 KLT 87.8 59

.2 5

.9 S

SPIN 138.8 130.9 130.9 F

SDBM 72.5 64.8 6.9 F

SC1 KLT 42.4 8.7 2.6 S

SPIN 54.1 24.3 2.2 S

SDBM 42.4 0.9 0.8 S

SC2 KLT 24.4 8.7 0.7 S

SPIN 44.7 43.7 2.6 F

SDBM 24.1 1.2 0.7 S

SL1 KLT 28.8 14.4 4.3 F

SPIN 38.2 10.0 0.4 S

SDBM 38.1 0.8 0.3 S

SL2 KLT 30.6 17.9 3.4 F

SPIN 103.0 88.2 4.0 F

SDBM 61.1 6.2 0.6 S

the number of depth edge points in R1, respectively. Then

computational complexity of finding the matching point is

O(nw nR2 nR1). Cost C(si , di ) between the two match-

ingwindows is measuredby themean-normalizedSSD (Sum

of squared difference) as follows:

WS =1

nW

i,jWS

rS(i, j), WD =1

nW

i,j WD

rD(i, j), (6)

C(si , di ) =

1

nW

i,jWS

((rS(i, j) WS) (rD(i, j) WD))2,

(7)

where rS(i, j) and rD(i, j) are the depth at pixel (i, j) of the

matching window.

Depthedge pointsused in thematchingalgorithm aresam-

pled from the original range images. As mentioned before,

only depth edge points areused to facilitate high curvature of

Fig. 16 Graphicuser interfaceto assista user adjusting thesensorpose

the edge. To make the matching process fast and reliable, we

reject some depth edge points as follows. First depth edge

points are sampled as shown in Fig. 12. In the left image,

red-colored (dark-grey) points are edge points sampled from

a range image.Second,weuniformlysamplesagain to reduce

the numberof edgepoints as shown in the center in the figure.

Finally, we remove some incomplete points. Here, we regard

a point as incomplete if any neighborhood in the matching

window W is a hole.

We do the depth points sampling in both range images.

The local matching point in R2 is determined by

local(di ) = argminj

cost

si , dj R2

. (8)

The final matching point is a global minimum ofsi , which

can be computed as

global(si ) = argminj

cost(local(dj ) R1). (9)

Even though a destination point is found to be the global

minimum, its cost C(si , di ) should be less than a thresh-

old value. If not, we reject such point. Figure 13 shows

an example of coarse matching. In Fig.13a, solid lines

show matching pairs of features between two range images.

In Fig.13b and c, the two 3D shapes displayed together

to compare before and after coarse registration. In this

case, 10 matching pairs are used for registration. Match-

ing window size is 15 15 and the size of R1 and R2are 100 and 50, respectively. Computing the transformation

matrix from matching pairs is done by the same method

with that of fine registration. Registration errors before and

after registration are 38.54 and 3.58 mm, respectively. Initial

123


12/17


Fig. 17 Results of Beethoven

a original images b range

images, c selected features

d tracked features, e camera

orientations, fReconstructed

model

registration error is reduced much enough to start refine-

ment.

5.2 Comparison of coarse matching

The proposed coarse registration technique can be consid-

ered to be a 3D shape matching technique. To match 3D

shapes obtained from different views, other conventional

shape matching techniques can be considered. Spin image

is one of the examples. By the way, conventional 2D match-

ing techniques can be used also because we have texture

images associated to every range images. In this section, we

compare theperformance of our3D matching techniquewith

other techniques, Spin image [8] and KLT [21].

Spin image is a 3D shape matching technique. Spin image

isa 2Dspaceof and which are mapping of 3Dmeasure of

123


13/17


Fig. 18 Results of

Sacheonwang a original

images, b range images,

c selected features, d tracked

features, e camera orientations,

freconstructed model

a point with respect to neighboring surface points. To run the

Spin image technique, the image size is set to 4040 and the

length of each bin of the image is set to 8 mm, which covers

160 mm from the measure point. To run KLT, the original

KLT algorithm is used and the size of matching window is

set to 15 15, as the same with that of our matching block

W. For pair comparison, we use the same number of features

which are extracted from SDBM.

Five objects are used for this comparison as shown in

Fig.14. Two test frames shown in the first and the second

columns of the figure are sampled from the video sequence

of the object. Corresponding range images are shown in the

123


14/17


Fig. 19 Results of Natural

scene a original images,

b range images, c selected

features, d tracked features,

e camera orientations,

freconstructed model

last two columns. The test range images are sampled so that

the initial pose becomes too wide to register them by our

refinement algorithm. In the range images, depth features

extracted by the SDBM are overlaid. In the third column,

two range images are overlapped to show the initial pose

between two frames.

Figure 15 shows coarse registration results. The first three

columns show theresultsof KLT, SPIN,andSDBM.The sec-

ondthreecolumnsshow thesameresultswith texturedpoints.

KLT andSDBM show very similar results while SPIN some-

times yields erroneous results. The main reason is that SPIN

hasmoredegreeof freedom than KLT andSDBMdueto rota-

tion andscale invariance. Thefirst three columns of thefigure

show thepose right after applying SDBM technique. The last

three columns show textured range images but after apply-

ing IPP which is followed by SDBM. Table 1 shows registra-

tion error measured between tworange imagesafter applying

each matching technique. In the table, decision of success

123


15/17


and fail is given by visual inspection of results. Registration

error is measured as the average distance between matching

pairs.

6 Experimental results

The stereo vision camera generates 15 range images per sec-ond. The resolution of a range image is 320 240. The pro-

posed 3D scanning technique is applied to two real objects

andone natural scene.A computerofa Pentium 3.4GHzCPU

isused. It takesabout800ms to registerper rangeimagewhen

both shape-based and texture-based registration refinements

are applied. In case that the shape-based registration is used

only, about 600 ms is needed.

6.1 Online graphic interaction

To register range images on-line, it is better for a human

operator to view the status of scanning through a graphicdisplay. Then the operator can adjust the view of the

next image frames. During 3D scanning, we provide a

user interaction based on graphic models and image dis-

play.

The graphic interaction system presents the user the cur-

rent registration status by rendering all range images. The

system also shows the position and orientation of each cam-

era displaying a graphic box and a line as in Fig. 16. Through

the display, the user can get a visual feedback to adjust the

speed of camera motion and the direction of camera. When

a range image fails, the user can adjust the position and ori-

entation of the camera to resume the registration process. In

addition if there are holes in the registered range images, the

user can acquire new range images to fill the hole. At one of

the corners of the display, two images are shown in real time.

The images are the current texture image and the previous

texture image. The graphical user interface is developed by

OpenGL.

6.2 Reconstruction results

The first 3D reconstruction experiment is performed using

a plaster model of Beethoven. Total 40 range images are

acquired continuously. In this experiment, the sensor rotates

around the object in about 90. The object is placed in front

of a random-dot background and the background ranges

are removed before registration. Fig. 17 shows some recon-

struction results. From Fig. 17ad show input images, range

images of the object areas, selected features, and feature

motions. Frame number 0, 13, 29, and 39 are shown. In

Fig. 17e, registered point clouds are shown with the all cam-

era positions. In Fig. 17fand g, the front view of the regis-

tered range images and an integration result are shown. To

Table 2 Registration and integration time (s/frame)

Object Registration Integration

20 frames 40 frames 60 frames

Beetoven 0.75 330 705 N.A.

Sachenwang 0.87 220 N.A. 650

Natural scene 0.91 390 1105 N.A.

Table 3 Average registration error

Object Registration error

Translation ( mm) Rotation

Beetoven 0.75 0.00052

Sachenwang 0.82 0.00028

Natural scene 4.55 0.00032

integrate registered range images into a 3D mesh model, wethe Marching Cubes algorithm [11]. For this reconstruction,

the voxel size is set to 3mm, and total 309,520 triangles are

generated.

Figure 18 shows results of another object called

Sacheonwang. The object is in the museum of our cam-

pus and its height is about 1m. An operator holds the range

camera in hand and moves it around the object. Because

the hand shaking, it was not easy to take continuous range

images without error, however total 60 range images are

acquired and registered. Due to the illumination condition of

the museum, there are more range errors than Beethoven,

however experimental results show its 3D model is reason-ably reconstructed. Figure 19 shows experimental results of

Natural scene. Due to the inherent noises in natural objects,

some parts of reconstruction show blur patterns. However,

reconstructionof 40range imagesyields thenatural 3Dshape

of the scene.

Table 2 presents registration and integration time of the

three experiments. Registration of a pair of range images

takes less than 1 s in average. It is acceptable speed for online

registration because an operator can move a range camera

carefully to acquire accurate range images. Table 3 shows

average registration and rotation error. The translation and

rotation errors are measured as explained in Sect. 4. Thetable shows that registration error is very small after the reg-

istration refinement. Translation error of Natural scene is

a lot high compared to the others. This is due to the noisy

background of the scene.

6.3 Reconstruction error analysis

To analyze the reconstruction error of our 3D scanning,

a reconstructed model of Beethoven is compared with a

123


16/17


Fig. 20 Reconstruction error

analysis a ground truth model of

Beethoven, b registration of

the reconstructed model(Green,

light grey) to the ground truth

ground truth model. To generate the ground truth model, a

NextEngine desktop 3D scanner is used to scan the same

object. The scanner is based on the laser-ranging technique.

Figure 20a shows the reconstructed ground truth model of

Beethoven.

Error analysis is done as follows. First, the reconstructed

3D model from our method is overlapped with the ground

truth model by manually. Second, using a simple ICP tech-

nique, we register our model to the ground truth. We use

the ICP for refinement because the ground truth model does

not have any image plane required to run IPP. Figure 20b

shows registered models. Third, by uniformly sampling the

3D points in our model, we measure the distance to the clos-

est point in the ground truth. In result, there is about 1.2mm

average error and 1.8mm standard deviation.

7 Conclusion

This paper proposesa new hand-held3D scanning technique.

Until today, not many investigations address the problem

of hand-held 3D scanning. Due to the unstable motion of

a human hand and the processing time of pose estimation,

conventional 3D shape registration or matching techniques

may fail to automatically align a sequence of range images.

In this paper, we combine fineandcoarse registration of mul-

tiple range images to overcome the problem. A sequence of

range images obtained by a stereo vision sensor is regis-

tered automatically to reconstruct 3D models of real objects.

A fast registration refinement technique aligns continuous

range images in a pair-wise manner. If the refinement step

fails, a coarse registration techniquefinds the initial pose of a

wide baseline range images to resume the refinement step. A

graphic interface displaying the status of registration on-line

helps a human operator to plan the next view of the sensor.

Using the proposed technique, we show the 3D reconstruc-

tion results of three real objects.

In this paper, we have shown only partial reconstruction

results. For complete 3D reconstruction, closing the surfaces

of an object is needed. Currently, we need to walk around an

object to scan all visible surfaces of the object. However, it

is still a difficult problem to scan and register all surfaces on-

line while walking around the object. Two main diffifulties

are sensorvibrationdue to human gait anderror propagation.

In the future, we will consider the complete 3D modeling

problem using a hand-held 3D sensor.

Acknowledgements Thiswork wassupported by theKoreaResearch

Foundation Grant funded by the KoreanGovernment. (KRF-2007-331-

D00423).

References

1. Akbarzadeh, A., Frahm, J.M., Mordohai, P., Clipp, B., Engels, C.,

Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L.,

Yang, Q., Stewenius, H., Yang, R., Welch, G., Towles, H., Nister

D., Pollefeys, M.: Towards urban 3D reconstruction from video.

In: Proceedings of 3DPVT06 (2006)

2. Besl, P.J., McKay, N.D.: A Method for Registration of 3-D

Shapes. IEEE Trans. Pattern Recogn. Mach. Intell. 14(2), 239

256 (1992)

3. Dias, P., Sequeira, V., Vaz, F., Goncalves, J.G.M.: Registration and

fusion of intensity and range data for 3D modelling of real world

123


17/17


scenes. In:FourthInternational Conferenceon3-DDigital Imaging

and Modeling, pp. 418421 (2003)

4. Hilton, A., Illingworth, J.: Geometric fusion for a hand-held 3d

sensor. Mach. Vis. Appl. 12(1), 4451 (2000)

5. Huber, D., Hebert, M.: 3-D modeling using a statistical sensor

model and stochastic search. In: Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition (CVPR),

pp. 858865 (2003)

6. Jaeggli, T., Koninckx, T.P., Van Gool, L.: Online 3D acquisition

and model integration, IEEE international workshop on projector-

camera systemsICCV03, cdrom proc (2003)

7. Johnson,A.E., Kang, S.B.:Registrationand integration of textured

3D data. Image Vis. Comput. 17(2), 135147 (1999)

8. Johnson A.: Spin-images: a representation for 3-D surface match-

ing, CMU-RI-TR-97-47 (1997)

9. Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D.,

Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J.,

Shade, J., Fulk D.: The digital michelangelo project: 3D scanning

of large statues. In: SIGGRAPH, pp. 131144 (2000)

10. Liu, Y., Heidrich, W.: Interactive 3D model acquisition and regis-

tration. In: Proceedings of 11th Pacific Conference on Computer

Graphics and Applications, pp 115122 (2003)

11. Lorensen, W.E., Cline Harvey, E.: Marching cubes: a high res-

olution 3D surface construction algorithm. ACM SIGGRAPH

Comput. Graph. 21(4), 163169 (1987)

12. Matabosch,C.,Fofi, D.,Salvi, J.,Batlle,E.:Registrationof surfaces

minimizing error propagation for a one-shot multi-slit hand-held

scanner. Pattern Recogn. 41(6), 20552067 (2008)

13. Park, S.Y., Subbarao, M.:An accurateand fastpoint-to-planeregis-

tration technique. Pattern Recogn. Lett. 24(16), 29672976 (2003)

14. Park,S.Y., Baek, J.: Online registrationof multi-viewrange images

using geometricand photometricfeaturetracking. In:The 6thInter-

nationalConferenceon 3-DDigital ImagingandModeling (3DIM)

(2007)

15. Popescu, V., Sacks, E., Bahmutov, G.: The model camera: a hand-

held device for interactive modeling. In: Proceedings of 3DIM03,

pp. 285292 (2003)

16. Popescu, V., Sacks, E., Bahmutov, G.: Interactive modeling from

dense color and sparse depth. In: Symposium on 3D Data Process-

ing, Visualization, and Transmission (3DPVT) (2004)

17. Rusinkiewicz, S., Hall-Holt, O., Levoy, M.: Real-time 3d model

acquisition. Proc. Siggraph 21(3), 438446 (2002)

18. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algo-

rithm. In: Third international conference on 3-D digital imaging

and modeling, pp. 145152 (2001)

19. Se,S., Jasiobedzki,P.: Instantscene modelerfor crime scene recon-

struction. IEEE Conf. Comput. Vis. Pattern Recogn. 3, 123123

(2005)

20. Shi, J. Tomasi, C.: Good features to track. IEEE Conf. Comput.

Vis. Pattern Recogn. 593600 (1994)

21. Tomasi, C., Kanade, T.: Detection and tracking of point features,

Carnegie Mellon University Technical Report CMU-CS-91-132

(1991)

22. Urfalioglu, O., Mikulastik, P., Stegmann, I.: Scale invariant robust

registration of 3D-point data and a triangle mesh by global opti-

mization. In: Proceedings of the 8th International Conference on

Advanced Concepts for Intelligent Vision Systems (ACIVS 2006),

LNCS, vol. 127, pp. 10591070 (2006)

23. Yoshida,K., SaitoH.: Registrationof range images using textureof

high-resolution color images. In: Proceedings of IAPR Workshop

on Machine Vision Applications(MVA2002) (2002)

24. Yun S.U., Min, D., Sohn, K.: 3D scene reconstruction system with

hand-held stereo camerasm 3DTV conference, pp. 14 (2007)

25. http://www.ces.clemson.edu/~stb/klt/

26. http://www.ptgrey.com

13
http://www.ces.clemson.edu/~stb/klt/http://www.ptgrey.com/http://www.ptgrey.com/http://www.ces.clemson.edu/~stb/klt/

Hand-Held 3D Scanning

Documents

Transcript of Hand-Held 3D Scanning