Hand-Held 3D Scanning

download Hand-Held 3D Scanning

of 17

Transcript of Hand-Held 3D Scanning

  • 8/3/2019 Hand-Held 3D Scanning

    1/17

    Machine Vision and Applications (2011) 22:563579

    DOI 10.1007/s00138-010-0248-1

    ORIGINAL PAPER

    Hand-held 3D scanning based on coarse and fine registrationof multiple range images

    Soon-Yong Park Jaewon Baek Jaekyoung Moon

    Received: 10 March 2009 / Revised: 25 November 2009 / Accepted: 14 January 2010 / Published online: 9 February 2010

    Springer-Verlag 2010

    Abstract A hand-held 3D scanning technique is proposed

    to reconstruct 3D modelsof real objects. A sequence of rangeimages captured from a hand-held stereo camera is automat-

    ically registered to a reference coordinate system. The auto-

    mated scanning process consists of two states, coarse and

    fine registration. At the beginning, scanning process starts

    at the fine registration state. A fast and accurate registra-

    tion refinement technique is used to align range images in

    a pair-wise manner. If the refinement technique fails, the

    process changes to the coarse registration state. A feature-

    based coarse registration technique is proposed to find corre-

    spondences between the last successful frame and thecurrent

    frame. If thecoarse registrationsuccesses, theprocess returns

    to the fine registration state again. A fast point-to-plane

    refinement technique is employed to do shape-based reg-

    istration. After the shape-based alignment, a texture-based

    refinement technique matches texture features to enhance

    visual appearance of the reconstructed models. Through a

    graphic and video display, a human operator adjusts the pose

    of the camera to change the view of the next acquisition.

    Experimental results show that 3D models of real objects are

    reconstructed from sequences of range images.

    S.-Y. Park (B)

    Department of Computer Engineering,

    Kyungpook National University, Daegu, 702-701, Korea

    e-mail: [email protected]

    J. Baek

    NeoMtel Incorporation, Seoul, 135-080, Korea

    J. Moon

    School of Electronics, Electrical and Computer Science,

    Kyungpook National University, Daegu, 702-701, Korea

    Keywords Hand-held 3D scanning

    Multi-view range images Registration refinement Coarse registration

    1 Introduction

    Three dimensional (3D) range data of an object can be

    acquired from several sensing techniques such as laser rang-

    ing, structured light, stereo vision, and so on. Due to the

    recent advances in such sensing techniques, multi-view 3D

    reconstruction of real objects is of very interests in com-

    puter vision and computer graphics researches. To recon-

    struct a complete 3D model of an object, multiple range

    images should be acquired from different viewing directions

    to obtain the partial shapes of the object. Then, the range

    images need to be merged to obtain the 3D model. In com-

    mon, multi-view 3D reconstruction means a complete pro-

    cess of 3D model generation from multiple range data.

    Multi-view reconstruction, however, is a very time con-

    suming task. One reason is the acquisition time of multiple

    range images. Even though some real-time 3D acquisition

    systems are introduced nowadays, general 3D ranging tech-

    niques require a lot of time to obtain multiple range images.

    For example, the laser ranging technique requires many

    images of laser stripes even for a single view reconstruc-

    tion. Therefore, reconstruction of a large scene usually takes

    several hours. Similarly the structured light technique needs

    a couple of or many pictures of specially designed patterns

    to obtain a single range image. The stereo vision technique

    has inherent matching problems which are complex and time

    consuming tasks to solve.

    Another reason is the alignment problem of multi-view

    range images. Each range imagecapturedby a range sensoris

    represented by an independent coordinate system regardless

    123

  • 8/3/2019 Hand-Held 3D Scanning

    2/17

    564 S.-Y. Park et al.

    of sensor pose. Therefore, range images of different coor-

    dinate systems should be aligned with respect to a com-

    mon coordinate system before being merged into a single

    3D model. This process is called registration. Sometimes,

    the initial poses of range images are known if they are cap-

    tured by a calibrated range sensor. However, the initial poses

    should be refined to reconstruct accurate 3D models. Reg-

    istration refinement is a very important task in multi-view3D reconstruction. Accuracy of a reconstructed 3D model

    is directly affected by the accuracy of refinement. With this

    reasons, many registration refinement techniques have been

    investigated in recent years.

    Besl and Mckay [2] propose the idea of ICP (Iterative

    Closest Point) technique. Registration refinement is mini-

    mizing the transformation error between the matching con-

    jugates between different range images (Actually they are

    rangesurfaces). ICPassumestheclosestpoints between them

    as the matching conjugates. Based on the ICP technique,

    various extensions are investigated. Some of them are as fol-

    lows. Johnson and Kang [7] modify the ICP technique tocombine color information in registration. Levoy et al. [9]

    introduce a voxel-based registration technique. Their tech-

    nique needs triangulation of range images to measure signed

    distance from a voxel to overlapping range images. Huber

    andHebert [5] introducea graph-based registration technique

    which can work without initial poses of range images. Ur-

    falioglu et al. [22] use a global optimization technique for

    uncalibrated registration. ICP can be used for large scale city

    modeling. Akbarzadeh et al. [1] combine INS and GPS data

    to register multi-view images for city-modeling.

    Recently, there are some investigations to reduce the time

    of multi-view 3D reconstruction based on on-line or interac-

    tive methods. Those methods are categorized by two scan-

    ning types. The first scanning type uses a hand-held sensor

    to scan a fixed object. The other type uses a fixed sensor to

    scan a hand-held object. Figure 1 shows two different types

    Fig. 1 Registration problem of a hand-held 3D sensor

    of on-line scanning. In the left of the figure, an object is held

    by hand and a fixed sensor is used to capture range images.

    Suppose the object in hand is rotated by a small angle and

    the two range images are obtained by the sensor before and

    after rotation. Then, the two poses of the object in the images

    are close enough to run a refinement algorithm to align the

    images. This is due to the fixed camera coordinate system.

    On the contrary, suppose a hand-held sensor is rotated bythe same angle as shown in the right of the figure, then the

    displacement of the object in the two range images becomes

    much larger than that of the fixed sensor due to the rotation

    of the camera coordinate system.

    Because each range image of a hand-held sensor is repre-

    sented by an independent camera coordinate system, unsta-

    ble scanning motion can cause registration failure. Figure 2

    shows an example. Here, we scan an object with the same

    distance and direction, but different scanning speed. In

    Fig. 2a, registration of range images fails, thus the figure

    shows incorrect position and orientation of the sensor (small

    white dots are camera centers and short red lines are theirorientations). With slower scanning speed, aligned range

    images and correct camera poses are obtained as shown in

    Fig. 2b.

    On-line 3D modeling using a hand-held sensor has inher-

    ent problems as mentioned in the previous paragraphs. In

    this reason, only a few investigations address the on-line 3D

    scanning or modeling problems. Liu and Heidrich [10] pro-

    pose a stereo-based on-line registration system based on real

    time processing hardware. Jaeggli and Koninckx [6] acquire

    multi-view range images using a pattern projector and reg-

    ister them on-line. Their approach uses a fixed sensor and

    a turn table to rotate an object. Therefore, initial poses of

    range images are known in advance. Popescu et al. [15,16]

    propose a real-time modeling system using a scanning rig

    which consists of calibrated laser dots and a video camera.

    The 3D orientation of projected laser dots are recorded and

    registered in about five frames per second. Their approach

    reconstructs 3D surface models by triangulation of sparse

    3D points. Rusinkiewicz et al. [17] propose a real-time reg-

    istration system using a pattern projector and a video cam-

    era. A pattern projector is used to project coded patterns to

    a hand-held object. 3D point clouds of the object are regis-

    tered in near real-time using a point-to-projection refinement

    technique. Se and Jasiobedzki [19] introduce a 3D modeling

    system using a hand-held stereo camera. They use texture

    features to register a sequence of range images for crime

    scene reconstruction. Yun et el. [24] uses a hand-held ste-

    reo camera to acquire range images of indoor scenes. Their

    off-line process of 3D reconstruction and model registration

    uses the well-known SIFT algorithm. Hilton and Illingworth

    [4] use a laser sensor attached at the end of an articulated

    robotic arm. Thearm hassixdegree-of-freedom andthe pose

    of the sensor is measured with respect to a global coordinate

    123

  • 8/3/2019 Hand-Held 3D Scanning

    3/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 565

    Fig. 2 a Erroneous registration,

    b successful registration

    system. Therefore, their approach do not need to consider

    the registration problem. Matabosch et. al. [12] propose an

    on-line registration technique which minimizes propagationerror which is inherent in pair-wise registration. They use a

    point-to-plane registration technique which is similar to that

    used in our method.

    In this paper, wepropose anon-line 3Dscanningtechnique

    which is based on a hand-held stereo sensor. A sequence of

    range images captured from the sensor is registered on-line.

    The on-line scanning process consists of two states, refine-

    ment andcoarse registration.Theprocessbegins at therefine-

    ment state. A registration refinement technique continuously

    registers range images in a pair-wise manner. To overcome

    refinement failure due to unstable hand motion, a fast coarse

    registration technique is proposed. If the refinement tech-

    nique fails, scanning process changes to the coarse registra-

    tion state. At this state, we match shape features between two

    different range images, and register them using the matching

    results. To reduce matching time, we sample depth edges as

    shape features. Matching process is done in a hierarchical

    manner for fast processing.

    The coarse registration estimates the initial pose between

    a pair of range images so that the refinement technique can

    resume. Once the initial pose is close enough, the refinement

    step begins to register subsequent frames again. The refine-

    ment technique uses both 3D shape and texture informa-

    tion for accurate 3D model reconstruction. Sampled texture

    features by the KLT tracker are used in both shape-based

    and texture-based registration [21]. Using a graphic and

    video display, a human operator can see registered range

    images and plan the next view interactively. Such interac-

    tion enhances the 3D scanning performance.

    Experimental results show thattheproposedtechniquecan

    register a long sequence of range images. Using the stereo

    sensor, we register range images at 1.2 frames per second.

    Error analysis of 3D reconstruction results in 1.2mm aver-

    age error with 1.8 mm standard deviation. Also, 3D models

    of real indoor and outdoor objects are shown.

    2 System overview

    Figure 3 shows a flow diagram of the proposed 3D scanning

    technique. Acquisition of multi-view range images is done

    by a stereo vision camera, BumbleBeefrom Pointgrey incor-

    poration [26]. Both range and texture images are acquired

    simultaneously from the camera. To separate foreground

    objects from the background, we remove some portions of

    range images, which are farther or closer than the working

    range from 0.3 to 2m. A range image captured from the

    camera is registered to the previously aligned range image.

    Therefore, all acquired range images are registered in a pair-

    wise manner. The first frame from the camera becomes the

    reference frame and the others are registered to the reference

    frame.

    For pair-wise registration, 3D points from two range

    images are sampled. The samples are chosen by the fea-

    ture selection routine of the KLT tracker [20,21]. Because

    that the KLT features are sampled usually from high contrast

    textures, they are good features to track in a 2D sequence

    of small motion. Therefore, later in the modeling process,

    we combine them with shape-based registration refinement

    technique.

    To use 3D features associated to the 2D features, we

    remove such KLT features which have no depth values.

    Because background ranges are removed beforehand, 2D

    features in the background are removed too. 3D features on

    the range images are registered iteratively by shape-based

    registration followed by texture-based registration. From the

    given 3D features in the current range image, their corre-

    spondences are determined by a point-to-plane registration

    technique [13]. The transformation between the current and

    123

  • 8/3/2019 Hand-Held 3D Scanning

    4/17

    566 S.-Y. Park et al.

    Fig. 3 Flow diagram of the

    proposed 3D registration

    technique

    Range/Te

    Foregro

    Fine registration

    ReYes

    < 10

    Texture-based registration

    Registr

    Coarse m

    Refinem

    Enoug

    < 10

    No

    No

    /Texture acquisition

    ground extraction

    Refinement No

    Coarse registration

    < 10

    Start

    istration ErrorCoarse mode = true

    Refinement = false

    e mode = false

    nement = true

    ugh frames ?

    No. of Control point< 10

    End

    Yes

    Yes

    Depth edge sampling

    Depth edge matching

    2D/3D feature selection

    Point-to-plane registration

    No. of Control point

    the previous range images is derived by the least-square

    manner.

    After shape-based registration, a modified KLT tracker

    refines the registration result again. 2D features are used

    in the texture-based registration step. Owing to the shape-

    based registration refinement, projections of the 2D fea-

    tures in one texture image are close to their correspondences

    in the other image. Thus, the processing time of feature

    matching is reduced. We modified the original KLT track-

    ing algorithm to make the correspondence search fast and

    accurate [14].

    The 3D scanning scheme consists of two states. As shown

    in Fig. 4, the scanning scheme starts at the fine registra-

    tion state. In each registration refinement step, we measure

    registration error to determine if the alignment is success-

    ful or not. If the current registration is determined to fail,

    we change the state of scanning to the coarse registration

    state. Once a human operator fail to register a range image,

    based on our several experiments, it is difficult for him to

    align the sensor very close to the last successful pose. Even

    though a graphical user interface is displayed on-line during

    acquisition, using only a refinement algorithm is not enough

    to recover the failure. Instead, we introduce a coarse reg-

    istration technique. After successful coarse registration, we

    resumeregistrationrefinement. Thescanningprocess is done

    continuously until enough range images are obtained and

    registered.

    Coarse

    registration

    Pairwise

    Refinement

    Success

    start

    Success

    Fail

    Fail

    stop

    Registration error

    Fig. 4 State diagram of the proposed 3D scanning process

    123

  • 8/3/2019 Hand-Held 3D Scanning

    5/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 567

    3 Registration refinement

    3.1 Shape-based range image refinement

    In this section, we describe a shape-based registration refine-

    ment technique. In general, there are three main categories of

    shape-based registration techniques. Figure 5 shows simple

    diagrams of them. In Point-to-point registration, the conju-gate of a control point p on the source surface is determined

    as the closest point q on the destination surface. An error

    metric ds is the distance between two control points.

    Point-to-plane registration is another common technique.

    It searches the intersection on the destination surface from

    thenormal vector of thesource point.As shownin theFig.5b,

    the destination control point q is the projection ofp onto the

    tangent plane at q which is the intersection from the normal

    ofp. Point-to-projection approach is known to be a fast reg-

    istration technique. As shown in the Fig. 5c, this approach

    determines a point q which is the conjugate of a source point

    p, by forward-projecting p from the point of view of the des-tination OQ. In order to determine the projection point, p is

    first backward-projected to a 2D point pQ on the range image

    plane of the destination surface, and then pQ is forward-pro-

    jected to the destination surface to get q. This algorithm is

    very fast because it does not include any searching step to

    find the correspondence. However, one of its disadvantages

    is that the result of registration is not as accurate as those of

    the others [17].

    In this paper, we usea point-to-plane techniquecalled IPP

    (iterative-projection-point) method for shape-based registra-

    tion [13]. This technique combines with the point-to-pro-

    jection technique to reduce processing time. Let us briefly

    explain theIPP method. To align two surfaces S andD shown

    in Fig. 6, we need to find correspondences between the two

    surfaces. For example, P0 in S corresponds to Q in surfaceD

    which is the intersection of the normal vector at P0. First, we

    project P0 in ID, the 2D image ofQ surface and fine coordi-

    nate pD. Then we can find point Qr using range image ofD

    surface that corresponds to the coordinate ofpD. A 3D point

    P1 is found from the projection Qr point to the normal of

    P0. Applying this iteratively n times, we can find Q which

    is the convergence point ofPn.Q is then determined by the

    pq

    ds

    ds

    ds

    (a)

    p

    q

    q'

    (b)

    p

    q

    OP

    OQ

    (c)

    Fig. 5 Three catagories of registration techniques. a Point-to-point,

    b point-to-plane, c point-to-projection

    Q

    Q

    p0p0

    Qr

    D

    v

    p1

    ID

    Tangent planeS

    pD

    p

    Fig. 6 Iterative projection point algorithm

    projection ofP0 to the tangent plane ofQ. Using K corre-

    sponding pairsPk0 and Qk,we findthetransformation matrix

    T = [R|t] that minimizes a registration error between two

    point sets using Eq. (1). We use the SVD (singular value

    decomposition) method to solve the equation.

    =

    Kk=1

    Qk (RPk0 + t)2. (1)

    This process calculates the Euclidean transformation

    matrix T between two surfaces repeatedly. Registration error

    is measured using rotation R and translation t of the trans-

    formation. We findthat registration error after about 30 itera-

    tionsdoes notdecreasein common. Ingeneral, sourcecontrol

    pointset P canbe selectedbysamplingthesource surface ran-

    domly or uniformly, andfiltered bysome constraints todelete

    unreliable control points. Selection of IPP control points are

    described in the next section.

    3.2 Selection of IPP control points

    To runIPP, some 3Dpointsmustbesampled ascontrol points.

    Sampling control points usually affects registration perfor-

    mance[18]. In this paper, a couple of constraints are applied

    to select the control points as follows.

    Texture features of theKLT tracker areused as thecontrol

    points.

    Invalid features are removed if they are in thebackground

    or have no 3D correspondences. If the distance between

    3D correspondences is too long, they are regarded as

    invalid correspondences.

    As shown in the left of the Fig. 7, only 2D features in

    the object area are used in both shape-based and texture-

    based refinement steps.

    In the right of the figure, the normalvectorof eachpoint is

    compared with that of theviewpoint to remove unreliable

    123

  • 8/3/2019 Hand-Held 3D Scanning

    6/17

    568 S.-Y. Park et al.

    Fig. 7 Feature point selection. Reliable KLT features are sampled and

    out of range points are removed

    Sa Sb Sc Sd Sa Sb Sc Sd

    Sn

    ii

    (a)

    Sa Sb Sc Sd Sa Sb Sc Sd

    i

    i

    Tab

    (b)

    Fig. 8 a Pair-wise registration problem, b initialization using the pre-

    vious transformation

    features. We remove a control point if the dot product of

    two vectors is greater than 0.5.

    3.3 Pair-wise registration

    When consecutive range images are registered to a refer-

    ence coordinate system, an initial registration error need to

    be small. Fig. 8a shows a simple case. After the shape Sbregistered to Sa, the next shape Sc should be registered to

    Sb which is aligned already. However, there is a large initial

    errorbetween Sb and Sc,which makes it difficult for shape Scto be aligned to Sb. If the initial registration error i between

    two range images is large, the refinement step is subject to

    fail to align them.

    A simple solution of this problem is transforming the cur-

    rent range image based on theprevious alignmentresult. This

    bringsthe current range image to thecoordinatesof theprevi-

    ous range image, which is already in the common coordinate

    system. In Fig. 8b, the transformation Tab whichbrings shape

    Sb to Sa is applied to surface Sc to initialize registration.

    3.4 Texture-based registration method

    Texture-base registration techniques employ 2D feature cor-

    respondences to estimate the 3D pose between range images

    pSqD

    rS(pS)rD(qD)

    pS

    ISID

    large search area

    TS

    (a)

    pSqD

    rS(pS)rD(qD)

    ps

    small search area

    ISID

    (b)

    Fig. 9 Comparison of KLT search range a before and b after shape-

    based registration

    [3,23,24]. In case that there is large pose difference, tracking

    of 2D texture features is not easy due to the wide baseline

    between views. In consequence, tracking time increases and

    success rate also decreases. One may filter out unsuccess-

    ful features by using a RANSAC algorithm, however it also

    take extra time for pose estimation. For accurate tracking, the

    size of search region and tracking windows must be traded

    off with tracking performance and time.

    In this paper, we improve the performance of correspon-

    dence tracking by reducing search range. The original KLT

    tracking algorithm is modified by employing the results of

    shape-based registration. In the original KLT tracker, the

    coordinates of the starting point of correspondence search

    in the current image are always the same with those in the

    previous images. In Fig. 9a, a feature point pS in image plane

    Is is shown also in the other image plane ID. The original

    KLT tracker starts search its correspondence qD, from the

    coordinates ofpS. Therefore, existing KLT tracker needs a

    wide search area to find the correspondence if images are

    obtained from much different viewing directions.

    Figure 9b shows a modified KLT method used in our

    experiments. When therange of a 2D feature pS is denoted as

    rS(pS), we can compute pS from the projection ofrS(pS) to

    123

  • 8/3/2019 Hand-Held 3D Scanning

    7/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 569

    Fig. 10 Feature tracking

    comparison a modified KLT

    b original KLT

    ID. When the range image r s is registered by transformation

    TS, it is projected to image ID by multiplying the perspective

    projection matrix M. The projection matrix is obtained from

    the calibrated camera in advance.

    pS = MTSrs(pS) (2)

    When the shape-based registration result is considered, the

    starting coordinates of the KLT tracker become very close to

    its correspondence. This however assumes the shape-based

    registration has small registration error. In most case, regis-

    tration result is accurate enough to bring the 2D correspon-

    dences in very close distance. Thus, using a small search

    region, we cantracktexture features very fast to further refine

    the pose.

    Figure 10 shows an example of texture feature tracking.

    In the figure, the green dots represent texture features and the

    white lines represent tracking results between two consecu-

    tive images. Using a smaller tracking window than that ofthe original KLT, almost all features are successfully tracked

    in the modified KLT. Because most features are success-

    fully tracked, we can estimate the transformation without

    any RANSAC algorithm.

    4 Handling registration failure

    Hand-held 3D scanning is a convenient way to reconstruct

    3D models of real objects. However, one drawback of this

    method is that the view of the 3D sensor is controlled by

    hand. One may think that hand motion is very controllable

    by a human operator. However, in terms of 3D registration,

    a very small motion of the hand yields large displacement

    between consecutive images. With this reason, conventional

    3D scanning is done by on-the-shelf systems.

    To reconstruct 3D shapes of real objects, we need to

    acquire range and texture imagesas many as possible.There-

    fore, it is needed to cope with the hand motion during scan-

    ning. During theregistrationrefinementstate,we check if the

    alignment of the current image is successful or not. If not,

    we move the state to coarse registration. In the coarse reg-

    istration state, a range image is captured and matched with

    the previously aligned range image. If it is not successful,

    the capturing and matching steps are done repeatedly until

    it is successful. If the coarse registration is successful, the

    scanning state returns to refinement step.Decision of coarse registration is done by measuring the

    pose error between the correspondences. Let Tn1 and Tnbe the transformation matrix of the (n 1)th and nth range

    images. Then the transformation

    Tn1,n = Tn1T1n (3)

    can be considered as an error between them. To measure the

    registration errorbetween (n 1)th and nth range images, we

    need to multiply Tn1 and T1n to the corresponding range

    images, and compute rotation and translation error between

    them. From matrixTn1,n = [ri j |ti ], rotation and translation

    errors are computed as follows.

    R =

    2i =0(Iii rii )

    2

    3, (4)

    t =

    Kk=1

    Qk P k0

    K, (5)

    where k is the number of correspondences between images.

    Rotation error is measured by the difference between the

    identity matrix and the rotation matrix. Rather than using

    the translation vector in Tn1,n, we use the average distance

    between the correspondences as the measure of translation.

    As mentionedin an earlier section, we provide a human oper-

    ator a graphic user interface to plan the view of the camera

    and check the status of on-line scanning.

    5 Coarse registration

    5.1 Sampled depth-edge block matching

    In this section, we propose a coarse registration technique to

    match two wide baseline range images. We call our coarse

    123

  • 8/3/2019 Hand-Held 3D Scanning

    8/17

    570 S.-Y. Park et al.

    Fig. 11 Matching strategy of

    SDBM

    Fig. 12 Sampling of depth

    edge points. From left, original

    depth edge, uniform sampling,

    and complete depth points

    Fig. 13 An example of SDBM.

    a Matched depth points,

    b before registration, c after

    registration

    registration method SDBM (Sample Depth edge Block

    Matching). Suppose there are two range images, source and

    destination as in Fig. 11. The source range image can be

    considered as the current image and similarly the destination

    image as the last range image aligned already in thereference

    coordinate system. From the source image, we pick some

    shape features which are originally on the edge of the range

    image. Edge features in the range image are chosen because

    they are independent of texture change. Edge features can

    be acquired directly from the range image. In this paper, we

    apply the Sobel filter to find edge features. A fixed threshold

    value is used to determine edge features. Determining the

    123

  • 8/3/2019 Hand-Held 3D Scanning

    9/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 571

    Fig. 14 Ten test pairs of coarse

    registration. From left,

    destination and source frames,

    Initial poses, depth and features

    of the two frames. From top,

    BT1, BT2, SB1, SB2,SM1,

    SM2, SC1, SC2, SL1, and SL2

    threshold value is not so critical in this case because the edge

    features are sampled later to reduce the number of features.

    Given a sample point si in the source, we find a matching

    point in the destination range image. To find the matching

    point, we define two search regions, R1 and R2. A square

    region R1 is defined by placing its center to the same coor-

    dinate with that ofsi . In R1, let one of the depth edge points

    be di . Another square region R2 is defined similarly with

    123

  • 8/3/2019 Hand-Held 3D Scanning

    10/17

    572 S.-Y. Park et al.

    Fig. 15 Results of coarse

    registration. From left to right

    Point clouds and textured results

    of KLT, SPIN, and SDBM. The

    point clouds show the poses

    before applying IPP, but the

    textured results show after

    applying IPP

    its center di . In R2, a matching window WD is defined at

    each point di

    and its cost is measured with respect to WS,

    the matching window of si . The second search region R2is defined at every destination points in R1. Therefore, the

    matching pair (si ,di ) is determined to yield the least cost

    value between WS and WD.

    Let nw, nR2, and nR1 be the number of depth points in

    the matching window, the number valid points in R2, and

    123

  • 8/3/2019 Hand-Held 3D Scanning

    11/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 573

    Table 1 Comparison of registration error ( mm)

    Object Method Initial Coarse Coarse + IPP S(uccess)/F(ail)

    BT1 KLT 48.3 19.1 0.9 S

    SPIN 63.7 42.3 1.3 S

    SDBM 40.3 10.6 1.2 S

    BT2 KLT 43.7 30.9 0.8 S

    SPIN 74.5 64.8 1.1 S

    SDBM 37.1 2.7 0.7 S

    SB1 KLT 7.9 F

    SPIN 115.4 96.9 2.6 S

    SDBM 54.2 43.8 3.9 S

    SB2 KLT 48.4 25.9 1.5 S

    SPIN 84.9 79.3 3.1 S

    SDBM 48.5 33.9 2.8 S

    SM1 KLT 55.5 21.6 0.5 S

    SPIN 62.8 40.5 2.7 S

    SDBM 57.3 6.6 0.5 S

    SM2 KLT 87.8 59

    .2 5

    .9 S

    SPIN 138.8 130.9 130.9 F

    SDBM 72.5 64.8 6.9 F

    SC1 KLT 42.4 8.7 2.6 S

    SPIN 54.1 24.3 2.2 S

    SDBM 42.4 0.9 0.8 S

    SC2 KLT 24.4 8.7 0.7 S

    SPIN 44.7 43.7 2.6 F

    SDBM 24.1 1.2 0.7 S

    SL1 KLT 28.8 14.4 4.3 F

    SPIN 38.2 10.0 0.4 S

    SDBM 38.1 0.8 0.3 S

    SL2 KLT 30.6 17.9 3.4 F

    SPIN 103.0 88.2 4.0 F

    SDBM 61.1 6.2 0.6 S

    the number of depth edge points in R1, respectively. Then

    computational complexity of finding the matching point is

    O(nw nR2 nR1). Cost C(si , di ) between the two match-

    ingwindows is measuredby themean-normalizedSSD (Sum

    of squared difference) as follows:

    WS =1

    nW

    i,jWS

    rS(i, j), WD =1

    nW

    i,j WD

    rD(i, j), (6)

    C(si , di ) =

    1

    nW

    i,jWS

    ((rS(i, j) WS) (rD(i, j) WD))2,

    (7)

    where rS(i, j) and rD(i, j) are the depth at pixel (i, j) of the

    matching window.

    Depthedge pointsused in thematchingalgorithm aresam-

    pled from the original range images. As mentioned before,

    only depth edge points areused to facilitate high curvature of

    Fig. 16 Graphicuser interfaceto assista user adjusting thesensorpose

    the edge. To make the matching process fast and reliable, we

    reject some depth edge points as follows. First depth edge

    points are sampled as shown in Fig. 12. In the left image,

    red-colored (dark-grey) points are edge points sampled from

    a range image.Second,weuniformlysamplesagain to reduce

    the numberof edgepoints as shown in the center in the figure.

    Finally, we remove some incomplete points. Here, we regard

    a point as incomplete if any neighborhood in the matching

    window W is a hole.

    We do the depth points sampling in both range images.

    The local matching point in R2 is determined by

    local(di ) = argminj

    cost

    si , dj R2

    . (8)

    The final matching point is a global minimum ofsi , which

    can be computed as

    global(si ) = argminj

    cost(local(dj ) R1). (9)

    Even though a destination point is found to be the global

    minimum, its cost C(si , di ) should be less than a thresh-

    old value. If not, we reject such point. Figure 13 shows

    an example of coarse matching. In Fig.13a, solid lines

    show matching pairs of features between two range images.

    In Fig.13b and c, the two 3D shapes displayed together

    to compare before and after coarse registration. In this

    case, 10 matching pairs are used for registration. Match-

    ing window size is 15 15 and the size of R1 and R2are 100 and 50, respectively. Computing the transformation

    matrix from matching pairs is done by the same method

    with that of fine registration. Registration errors before and

    after registration are 38.54 and 3.58 mm, respectively. Initial

    123

  • 8/3/2019 Hand-Held 3D Scanning

    12/17

    574 S.-Y. Park et al.

    Fig. 17 Results of Beethoven

    a original images b range

    images, c selected features

    d tracked features, e camera

    orientations, fReconstructed

    model

    registration error is reduced much enough to start refine-

    ment.

    5.2 Comparison of coarse matching

    The proposed coarse registration technique can be consid-

    ered to be a 3D shape matching technique. To match 3D

    shapes obtained from different views, other conventional

    shape matching techniques can be considered. Spin image

    is one of the examples. By the way, conventional 2D match-

    ing techniques can be used also because we have texture

    images associated to every range images. In this section, we

    compare theperformance of our3D matching techniquewith

    other techniques, Spin image [8] and KLT [21].

    Spin image is a 3D shape matching technique. Spin image

    isa 2Dspaceof and which are mapping of 3Dmeasure of

    123

  • 8/3/2019 Hand-Held 3D Scanning

    13/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 575

    Fig. 18 Results of

    Sacheonwang a original

    images, b range images,

    c selected features, d tracked

    features, e camera orientations,

    freconstructed model

    a point with respect to neighboring surface points. To run the

    Spin image technique, the image size is set to 4040 and the

    length of each bin of the image is set to 8 mm, which covers

    160 mm from the measure point. To run KLT, the original

    KLT algorithm is used and the size of matching window is

    set to 15 15, as the same with that of our matching block

    W. For pair comparison, we use the same number of features

    which are extracted from SDBM.

    Five objects are used for this comparison as shown in

    Fig.14. Two test frames shown in the first and the second

    columns of the figure are sampled from the video sequence

    of the object. Corresponding range images are shown in the

    123

  • 8/3/2019 Hand-Held 3D Scanning

    14/17

    576 S.-Y. Park et al.

    Fig. 19 Results of Natural

    scene a original images,

    b range images, c selected

    features, d tracked features,

    e camera orientations,

    freconstructed model

    last two columns. The test range images are sampled so that

    the initial pose becomes too wide to register them by our

    refinement algorithm. In the range images, depth features

    extracted by the SDBM are overlaid. In the third column,

    two range images are overlapped to show the initial pose

    between two frames.

    Figure 15 shows coarse registration results. The first three

    columns show theresultsof KLT, SPIN,andSDBM.The sec-

    ondthreecolumnsshow thesameresultswith texturedpoints.

    KLT andSDBM show very similar results while SPIN some-

    times yields erroneous results. The main reason is that SPIN

    hasmoredegreeof freedom than KLT andSDBMdueto rota-

    tion andscale invariance. Thefirst three columns of thefigure

    show thepose right after applying SDBM technique. The last

    three columns show textured range images but after apply-

    ing IPP which is followed by SDBM. Table 1 shows registra-

    tion error measured between tworange imagesafter applying

    each matching technique. In the table, decision of success

    123

  • 8/3/2019 Hand-Held 3D Scanning

    15/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 577

    and fail is given by visual inspection of results. Registration

    error is measured as the average distance between matching

    pairs.

    6 Experimental results

    The stereo vision camera generates 15 range images per sec-ond. The resolution of a range image is 320 240. The pro-

    posed 3D scanning technique is applied to two real objects

    andone natural scene.A computerofa Pentium 3.4GHzCPU

    isused. It takesabout800ms to registerper rangeimagewhen

    both shape-based and texture-based registration refinements

    are applied. In case that the shape-based registration is used

    only, about 600 ms is needed.

    6.1 Online graphic interaction

    To register range images on-line, it is better for a human

    operator to view the status of scanning through a graphicdisplay. Then the operator can adjust the view of the

    next image frames. During 3D scanning, we provide a

    user interaction based on graphic models and image dis-

    play.

    The graphic interaction system presents the user the cur-

    rent registration status by rendering all range images. The

    system also shows the position and orientation of each cam-

    era displaying a graphic box and a line as in Fig. 16. Through

    the display, the user can get a visual feedback to adjust the

    speed of camera motion and the direction of camera. When

    a range image fails, the user can adjust the position and ori-

    entation of the camera to resume the registration process. In

    addition if there are holes in the registered range images, the

    user can acquire new range images to fill the hole. At one of

    the corners of the display, two images are shown in real time.

    The images are the current texture image and the previous

    texture image. The graphical user interface is developed by

    OpenGL.

    6.2 Reconstruction results

    The first 3D reconstruction experiment is performed using

    a plaster model of Beethoven. Total 40 range images are

    acquired continuously. In this experiment, the sensor rotates

    around the object in about 90. The object is placed in front

    of a random-dot background and the background ranges

    are removed before registration. Fig. 17 shows some recon-

    struction results. From Fig. 17ad show input images, range

    images of the object areas, selected features, and feature

    motions. Frame number 0, 13, 29, and 39 are shown. In

    Fig. 17e, registered point clouds are shown with the all cam-

    era positions. In Fig. 17fand g, the front view of the regis-

    tered range images and an integration result are shown. To

    Table 2 Registration and integration time (s/frame)

    Object Registration Integration

    20 frames 40 frames 60 frames

    Beetoven 0.75 330 705 N.A.

    Sachenwang 0.87 220 N.A. 650

    Natural scene 0.91 390 1105 N.A.

    Table 3 Average registration error

    Object Registration error

    Translation ( mm) Rotation

    Beetoven 0.75 0.00052

    Sachenwang 0.82 0.00028

    Natural scene 4.55 0.00032

    integrate registered range images into a 3D mesh model, wethe Marching Cubes algorithm [11]. For this reconstruction,

    the voxel size is set to 3mm, and total 309,520 triangles are

    generated.

    Figure 18 shows results of another object called

    Sacheonwang. The object is in the museum of our cam-

    pus and its height is about 1m. An operator holds the range

    camera in hand and moves it around the object. Because

    the hand shaking, it was not easy to take continuous range

    images without error, however total 60 range images are

    acquired and registered. Due to the illumination condition of

    the museum, there are more range errors than Beethoven,

    however experimental results show its 3D model is reason-ably reconstructed. Figure 19 shows experimental results of

    Natural scene. Due to the inherent noises in natural objects,

    some parts of reconstruction show blur patterns. However,

    reconstructionof 40range imagesyields thenatural 3Dshape

    of the scene.

    Table 2 presents registration and integration time of the

    three experiments. Registration of a pair of range images

    takes less than 1 s in average. It is acceptable speed for online

    registration because an operator can move a range camera

    carefully to acquire accurate range images. Table 3 shows

    average registration and rotation error. The translation and

    rotation errors are measured as explained in Sect. 4. Thetable shows that registration error is very small after the reg-

    istration refinement. Translation error of Natural scene is

    a lot high compared to the others. This is due to the noisy

    background of the scene.

    6.3 Reconstruction error analysis

    To analyze the reconstruction error of our 3D scanning,

    a reconstructed model of Beethoven is compared with a

    123

  • 8/3/2019 Hand-Held 3D Scanning

    16/17

    578 S.-Y. Park et al.

    Fig. 20 Reconstruction error

    analysis a ground truth model of

    Beethoven, b registration of

    the reconstructed model(Green,

    light grey) to the ground truth

    ground truth model. To generate the ground truth model, a

    NextEngine desktop 3D scanner is used to scan the same

    object. The scanner is based on the laser-ranging technique.

    Figure 20a shows the reconstructed ground truth model of

    Beethoven.

    Error analysis is done as follows. First, the reconstructed

    3D model from our method is overlapped with the ground

    truth model by manually. Second, using a simple ICP tech-

    nique, we register our model to the ground truth. We use

    the ICP for refinement because the ground truth model does

    not have any image plane required to run IPP. Figure 20b

    shows registered models. Third, by uniformly sampling the

    3D points in our model, we measure the distance to the clos-

    est point in the ground truth. In result, there is about 1.2mm

    average error and 1.8mm standard deviation.

    7 Conclusion

    This paper proposesa new hand-held3D scanning technique.

    Until today, not many investigations address the problem

    of hand-held 3D scanning. Due to the unstable motion of

    a human hand and the processing time of pose estimation,

    conventional 3D shape registration or matching techniques

    may fail to automatically align a sequence of range images.

    In this paper, we combine fineandcoarse registration of mul-

    tiple range images to overcome the problem. A sequence of

    range images obtained by a stereo vision sensor is regis-

    tered automatically to reconstruct 3D models of real objects.

    A fast registration refinement technique aligns continuous

    range images in a pair-wise manner. If the refinement step

    fails, a coarse registration techniquefinds the initial pose of a

    wide baseline range images to resume the refinement step. A

    graphic interface displaying the status of registration on-line

    helps a human operator to plan the next view of the sensor.

    Using the proposed technique, we show the 3D reconstruc-

    tion results of three real objects.

    In this paper, we have shown only partial reconstruction

    results. For complete 3D reconstruction, closing the surfaces

    of an object is needed. Currently, we need to walk around an

    object to scan all visible surfaces of the object. However, it

    is still a difficult problem to scan and register all surfaces on-

    line while walking around the object. Two main diffifulties

    are sensorvibrationdue to human gait anderror propagation.

    In the future, we will consider the complete 3D modeling

    problem using a hand-held 3D sensor.

    Acknowledgements Thiswork wassupported by theKoreaResearch

    Foundation Grant funded by the KoreanGovernment. (KRF-2007-331-

    D00423).

    References

    1. Akbarzadeh, A., Frahm, J.M., Mordohai, P., Clipp, B., Engels, C.,

    Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L.,

    Yang, Q., Stewenius, H., Yang, R., Welch, G., Towles, H., Nister

    D., Pollefeys, M.: Towards urban 3D reconstruction from video.

    In: Proceedings of 3DPVT06 (2006)

    2. Besl, P.J., McKay, N.D.: A Method for Registration of 3-D

    Shapes. IEEE Trans. Pattern Recogn. Mach. Intell. 14(2), 239

    256 (1992)

    3. Dias, P., Sequeira, V., Vaz, F., Goncalves, J.G.M.: Registration and

    fusion of intensity and range data for 3D modelling of real world

    123

  • 8/3/2019 Hand-Held 3D Scanning

    17/17

    Hand-held 3D scanning based on coarse and fine registration of multiple range images 579

    scenes. In:FourthInternational Conferenceon3-DDigital Imaging

    and Modeling, pp. 418421 (2003)

    4. Hilton, A., Illingworth, J.: Geometric fusion for a hand-held 3d

    sensor. Mach. Vis. Appl. 12(1), 4451 (2000)

    5. Huber, D., Hebert, M.: 3-D modeling using a statistical sensor

    model and stochastic search. In: Proceedings of the IEEE Con-

    ference on Computer Vision and Pattern Recognition (CVPR),

    pp. 858865 (2003)

    6. Jaeggli, T., Koninckx, T.P., Van Gool, L.: Online 3D acquisition

    and model integration, IEEE international workshop on projector-

    camera systemsICCV03, cdrom proc (2003)

    7. Johnson,A.E., Kang, S.B.:Registrationand integration of textured

    3D data. Image Vis. Comput. 17(2), 135147 (1999)

    8. Johnson A.: Spin-images: a representation for 3-D surface match-

    ing, CMU-RI-TR-97-47 (1997)

    9. Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D.,

    Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J.,

    Shade, J., Fulk D.: The digital michelangelo project: 3D scanning

    of large statues. In: SIGGRAPH, pp. 131144 (2000)

    10. Liu, Y., Heidrich, W.: Interactive 3D model acquisition and regis-

    tration. In: Proceedings of 11th Pacific Conference on Computer

    Graphics and Applications, pp 115122 (2003)

    11. Lorensen, W.E., Cline Harvey, E.: Marching cubes: a high res-

    olution 3D surface construction algorithm. ACM SIGGRAPH

    Comput. Graph. 21(4), 163169 (1987)

    12. Matabosch,C.,Fofi, D.,Salvi, J.,Batlle,E.:Registrationof surfaces

    minimizing error propagation for a one-shot multi-slit hand-held

    scanner. Pattern Recogn. 41(6), 20552067 (2008)

    13. Park, S.Y., Subbarao, M.:An accurateand fastpoint-to-planeregis-

    tration technique. Pattern Recogn. Lett. 24(16), 29672976 (2003)

    14. Park,S.Y., Baek, J.: Online registrationof multi-viewrange images

    using geometricand photometricfeaturetracking. In:The 6thInter-

    nationalConferenceon 3-DDigital ImagingandModeling (3DIM)

    (2007)

    15. Popescu, V., Sacks, E., Bahmutov, G.: The model camera: a hand-

    held device for interactive modeling. In: Proceedings of 3DIM03,

    pp. 285292 (2003)

    16. Popescu, V., Sacks, E., Bahmutov, G.: Interactive modeling from

    dense color and sparse depth. In: Symposium on 3D Data Process-

    ing, Visualization, and Transmission (3DPVT) (2004)

    17. Rusinkiewicz, S., Hall-Holt, O., Levoy, M.: Real-time 3d model

    acquisition. Proc. Siggraph 21(3), 438446 (2002)

    18. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algo-

    rithm. In: Third international conference on 3-D digital imaging

    and modeling, pp. 145152 (2001)

    19. Se,S., Jasiobedzki,P.: Instantscene modelerfor crime scene recon-

    struction. IEEE Conf. Comput. Vis. Pattern Recogn. 3, 123123

    (2005)

    20. Shi, J. Tomasi, C.: Good features to track. IEEE Conf. Comput.

    Vis. Pattern Recogn. 593600 (1994)

    21. Tomasi, C., Kanade, T.: Detection and tracking of point features,

    Carnegie Mellon University Technical Report CMU-CS-91-132

    (1991)

    22. Urfalioglu, O., Mikulastik, P., Stegmann, I.: Scale invariant robust

    registration of 3D-point data and a triangle mesh by global opti-

    mization. In: Proceedings of the 8th International Conference on

    Advanced Concepts for Intelligent Vision Systems (ACIVS 2006),

    LNCS, vol. 127, pp. 10591070 (2006)

    23. Yoshida,K., SaitoH.: Registrationof range images using textureof

    high-resolution color images. In: Proceedings of IAPR Workshop

    on Machine Vision Applications(MVA2002) (2002)

    24. Yun S.U., Min, D., Sohn, K.: 3D scene reconstruction system with

    hand-held stereo camerasm 3DTV conference, pp. 14 (2007)

    25. http://www.ces.clemson.edu/~stb/klt/

    26. http://www.ptgrey.com

    13

    http://www.ces.clemson.edu/~stb/klt/http://www.ptgrey.com/http://www.ptgrey.com/http://www.ces.clemson.edu/~stb/klt/