Articulated motion reconstruction from feature points

14
Pattern Recognition 41 (2008) 418 – 431 www.elsevier.com/locate/pr Articulated motion reconstruction from feature points B. Li a , , Q. Meng b , H. Holstein c a Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, M1 5GD, UK b Department of Computer Science, Loughborough University, Loughborough, LE11 3TU, UK c Department of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Wales, UK Received 17 October 2006; received in revised form 25 March 2007; accepted 6 June 2007 Abstract A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic key- frame-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to enhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in a single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving self-resuming tracking for articulated motion. 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Non-rigid articulated motion; Point pattern matching; Non-rigid pose estimation; Motion tracking and object recognition 1. Introduction Visual interpretation of non-rigid articulated motion has lately seen somewhat of a renaissance in computer vision and pattern recognition. The motivation for directing existing motion analysis of rigid objects towards non-rigid articulated objects [1,2], especially human motion [3–5], is driven by potential applications such as human–computer interaction, surveillance systems, entertainment and medical studies. A large body of research, dedicated to the task of structure and motion analysis, utilises feature-based methods regardless of parametrisation by points, lines, curves or surfaces. Among these, concise feature-point representation, advantageously abstracting the underlying movement, is usually used as an Corresponding author. Tel.: +44 161 247 3598; fax: +44 161 247 1483. E-mail addresses: [email protected] (B. Li), [email protected] (Q. Meng), [email protected] (H. Holstein). 0031-3203/$30.00 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2007.06.002 essential or intermediate correspondence towards the end- product of motion and structure recovery [6–8]. In the context of vision cues via feature-point representa- tion, the spatio–temporal information is notably reduced to a sequence of unidentified points moving over time. To determine the subject’s structure and therefore its underlying skeletal-style movements for the purpose of high-level recognition, two fun- damental problems of feature-point tracking and identification need to be solved. Tracking feature points in successive frames has been investigated extensively in the literatures [9–12]. How- ever, the identities of the subject feature points are not obtain- able from inter-frame tracking alone. Feature-point identification requires the determination of which point in an observed data frame corresponds to which point in its model, thus allowing recovery of structure. The task addresses the difficult problem of automatic model matching and identification, crucial at the start of tracking or on resump- tion from tracking loss. Currently, most tracking approaches simplify the problem to incremental pose estimation, relying

Transcript of Articulated motion reconstruction from feature points

Page 1: Articulated motion reconstruction from feature points

Pattern Recognition 41 (2008) 418–431www.elsevier.com/locate/pr

Articulated motion reconstruction from feature points

B. Lia,∗, Q. Mengb, H. Holsteinc

aDepartment of Computing and Mathematics, Manchester Metropolitan University, Manchester, M1 5GD, UKbDepartment of Computer Science, Loughborough University, Loughborough, LE11 3TU, UKcDepartment of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Wales, UK

Received 17 October 2006; received in revised form 25 March 2007; accepted 6 June 2007

Abstract

A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem offeature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we proposea general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solelyby sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame trackingto achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic key-frame-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed toenhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in asingle frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for acommon articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, inbuffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involvingself-resuming tracking for articulated motion.� 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords: Non-rigid articulated motion; Point pattern matching; Non-rigid pose estimation; Motion tracking and object recognition

1. Introduction

Visual interpretation of non-rigid articulated motion haslately seen somewhat of a renaissance in computer visionand pattern recognition. The motivation for directing existingmotion analysis of rigid objects towards non-rigid articulatedobjects [1,2], especially human motion [3–5], is driven bypotential applications such as human–computer interaction,surveillance systems, entertainment and medical studies. Alarge body of research, dedicated to the task of structure andmotion analysis, utilises feature-based methods regardless ofparametrisation by points, lines, curves or surfaces. Amongthese, concise feature-point representation, advantageouslyabstracting the underlying movement, is usually used as an

∗ Corresponding author. Tel.: +44 161 247 3598; fax: +44 161 247 1483.E-mail addresses: [email protected] (B. Li), [email protected]

(Q. Meng), [email protected] (H. Holstein).

0031-3203/$30.00 � 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2007.06.002

essential or intermediate correspondence towards the end-product of motion and structure recovery [6–8].

In the context of vision cues via feature-point representa-tion, the spatio–temporal information is notably reduced to asequence of unidentified points moving over time. To determinethe subject’s structure and therefore its underlying skeletal-stylemovements for the purpose of high-level recognition, two fun-damental problems of feature-point tracking and identificationneed to be solved. Tracking feature points in successive frameshas been investigated extensively in the literatures [9–12]. How-ever, the identities of the subject feature points are not obtain-able from inter-frame tracking alone.

Feature-point identification requires the determination ofwhich point in an observed data frame corresponds to whichpoint in its model, thus allowing recovery of structure. The taskaddresses the difficult problem of automatic model matchingand identification, crucial at the start of tracking or on resump-tion from tracking loss. Currently, most tracking approachessimplify the problem to incremental pose estimation, relying

Page 2: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 419

on manual model fitting at the start of tracking, or on an as-sumption of initial pose similarity and alignment to the model,or on pre-knowledge of a specific motion from which to inferan initial pose [5,13]. In this sense, the general recovery of non-rigid articulated motion solely from feature points still remainsan open problem. There is a relative dearth of algorithmic self-initialisation for articulated motion reconstruction from only acollection of sparse feature points.

Motivated by these observations, we present a dynamicsegment-based hierarchical point matching (DSHPM) algo-rithm to address self-initialising articulated motion reconstruc-tion from sparse feature points. The articulated motion weare considering describes general segmental jointed freeformmovement. The motion of each segment can be considered asrigid or nearly rigid, but the motion of the object as a wholeis high-dimensionally non-rigid. In our work, the articulatedmodel of an observed subject is a priori known, suggesting amodel-based approach. As a general solution to the problem,the algorithm only assumes availability of feature-point motiondata, such as obtained in our experiments via a marker-basedmotion capture system. We do not make the usual simplify-ing assumptions of model-pose similarity or restricted motionclass for tracking initialisation, nor do we require absence ofdata noise. The algorithm aims to establish one-to-one matchesbetween the model point-set and its freeform motion data toreconstruct the underlying articulated movements in bufferedreal-time.

2. Related work

The problem of automatically identifying feature points to re-trieve underlying articulated movement can be inherently diffi-cult for a number of reasons: (1) the possibility of globally highdimensionality to depict the articulated structure; (2) relaxationof segment rigidity to allow for limited distortion; (3) data cor-ruption due to missing (occluded) and extra (via the process offeature extraction) data; (4) unrestricted and arbitrary poses infreeform movements; (5) requirements of self-initialising track-ing and identification; and (6) computational cost compatiblewith real-time. While few works have attempted to address allthese issues in the context of sparse feature-point representation(an early exploratory paper was published in Ref. [14]), manyresearches have attacked a variety of aspects of the problem. Inthis section, we review techniques related to the problem fromtwo categories: point pattern matching (PPM) and articulatedmotion tracking.

PPM is a fundamental problem for object recognition, motionestimation and image registration in a wide variety of circum-stances [15,16]. Many of the approaches have focused on rigid,affine or projective correspondence, using techniques such asgraphs, interpretation trees [17], Hausdorff distance [18], geo-metric alignment and hashing [19]. In these cases, the developedtechniques are based on geometric invariance or constraint sat-isfaction in affine transformations, yielding approximate map-ping between objects and models [20]. However, these methodscannot be easily extended to the high-configuration dimension-ality of a complex articulated motion.

For modelling non-rigidity, elastic model [2,21], weighted-graph matching [22] and thin-plate spline approaches [23] havebeen developed to formulate densely distributed points intohigh-level structural presentations of lines, curves or surfaces.However, the necessary spatial data continuity is not availablein the case of sparse points representing skeletal structures.Piecewise approaches [24,25] are probably the most appropri-ate for segmental data. In our case, a set of piecewise affinetransformations with allowable distortion relaxation are soughtfor marching to an articulated segment hierarchy under kine-matic constraints.

A second category of literature deals with the tracking ofa particular type of articulated motion: human motion. Exist-ing algorithms commonly are model-based to reconstruct pre-cise poses from video images. The main challenge is to tracka large number of degrees of freedom in high-dimensionalfreeform movements in the presence of image noise. To im-prove the reliability and efficiency of motion tracking, a spatialhierarchical search, using certain heuristics such as colour orappearance consistency, has proved successful [26,27]. How-ever, the spatial hierarchy and matching heuristic may not beapplicable in individual frames due to self-occlusion and im-age noise. In that case, spatio–temporal approaches have beenshown advantageous in recent researches. Sigal et al. [28] intro-duced a “loose-limbed model” to emphasise motion coherencyin tracking. Lan and Huttenlocher [29] developed a “unifiedspatio–temporal” model exploring both spatial hierarchy andtemporal coherency in articulated tracking from silhouette data.Spatio–temporal methods have enabled robust limb tracking inmulti-target tracking [30], outdoor scene analysis [31] and 3Dreconstruction of human motion [32]. Our work benefits fromthe spatio–temporal concept. However, methodologies basedon the rich information of images cannot be adapted to ourproblem domain of motion reconstruction from concise feature-point representation.

Marker-based motion capture systems exemplify point-feature trackers [33]. Coloured markers, active markers, ora set of specially designed marker patterns have been usedto code the identification information in some systems. Suchapproaches side step the hard problem of marker identifica-tion, but at the expense of losing application generality. Thegeneric PPM problem in articulated motion is exemplified by astate-of-the-art optical MoCap system, e.g. Vicon [34], withoutrecourse to marker coding. However, auto-identification mayfail for complex motion. MoCap data normally need time-consuming manual post-processing before they can be usedfor actual applications.

Our previous baseline study [35] developed a segment-basedarticulated point matching algorithm for identifying an ar-bitrary pose of an articulated subject with sparse point fea-tures from a single-frame data. The algorithm provided a self-initialisation phase of pose estimation, crucial at the beginningor on resumption of tracking. It utilised an iterative “coarse-to-fine” matching scheme, benefiting from the well-known it-erative closest point (ICP) algorithm [36], to establish a setof relaxed affine segmental correspondences between a modelpoint-set and an observed data set taken from one frame of ar-

Page 3: Articulated motion reconstruction from feature points

420 B. Li et al. / Pattern Recognition 41 (2008) 418–431

ticulated motion. However, we argued the possibility of morerobust motion reconstruction by combining with informationfrom inter-frame tracking, that will eventually reduce the uncer-tainty inherent in the matching problem for single-frame data[37].

Pursuing the cross-fertilisation of the research and existingtechniques, we extend our previous study on single-frame ar-ticulated pose identification [35,37] into a dynamic context.We propose a DSHPM algorithm, targeting the reconstructionof articulated movement from motion sequences. The DSHPMalgorithm integrates inter-frame tracking and spatial hierarchi-cal matching to achieve the effectiveness of articulated PPMto buffered real-time. The idea of segment-based articulatedmatching, as computational substrate to explore the spatial hi-erarchy [35,37], is enhanced by exploiting motion coherencyembedded in inter-frames that ultimately reduces the ambiguityof identification in the presence of data noise.

3. Framework of the model-based DSHPM algorithm

The generic task under consideration arose from the needto identify feature-point data to reconstruct underlying skeletalstructure of freeform articulated motion. We assume the datacapture rate is sufficiently high, as demanded in most real-worldapplications. This allows the obtaining of feature-point trajec-tories in successive frames. However, the identities of featurepoints (or trajectories) are not known.

3.1. The articulated model and motion data

The subject to be tracked is pre-modelled. A subject modelcomprises S segments with complete feature points. Each seg-ment Ps ={ps,i |i=1, . . . , Ms} has Ms identified feature pointsps,i . The feature points are sufficient in number and distributionto indicate the orientation and segment structure with demandeddetails. Segment non-rigidity is allowed within a threshold ofsegmental distortion ratio �s . Articulation is indicated throughjoin-point commonality between two segments, suggesting asegment-based hierarchy. To keep the algorithm general, eachsegment undergoes independent motion constrained only byjoint points. We do not impose motion constraints such as fea-sible biological poses for a specific subject type.

The observed motion data of the subject is represented bya sequence of point-sets at each time frame t: Qt = {qt

j |j =1, . . . , Nt }, where the Nt data points qt

j could be corrupted bymissing data due to occlusion and extra noise data arising fromthe process of feature extraction.

3.2. Outline of the DSHPM

To identify massive data within a complex motion sequence,frame-by-frame model fitting would be computationally ex-pensive and unnecessary. Ideally, initial entire model fittingneed only be attempted at some key-frames, in particular, atthe start of tracking or on the resumption from tracking fail-ure. Subsequent identification of an individual feature point

success?

parent–segment correction

dynamic segment–based hierarchical identification in a key–frame range

recruitmenttracking &

backward

success?

pre–tracking &pre–segmentation

segment

tracking and identity propagation

abandon the

NY

key–frame range

failed 2 times?

identification & recruitment

key–frame

dynamic key–frame–shift

has parent?

an articulated motion sequence presented by fearture points ......

CT-based iterative segmental matching

Y

N

Y

N

YN

& recruitmentmotion verification

next segment (if any) depth–first along hierarchical tree

Fig. 1. Framework of the dynamic segment-based hierarchical point matching(DSHPM) algorithm.

can be achieved by tracking over its trajectory. In the caseof broken tracks, re-identification costs are largely reducedby reference to the known points whose identities are carriedforward.

The framework of the proposed DSHPM algorithm is shownin Fig. 1. As computation substrate for initial model fitting,it employs a hierarchial segmental mapping supported bycandidate-table (CT) optimisation at a key-frame (Section 4.2).

To reduce the inherent uncertainty of segmental match-ing in a single key-frame, a dynamic scheme (Section 4.3)incorporating temporal coherency in a key-frame range isexplored through inter-frame tracking. This includes threephases: motion verification, dynamic key-frame-shift identi-fication and backward parent-segment correction, as shownin Fig. 1. Under CT-based iterative matching, the algorithmfirst verifies that a proposed segmental match is consistent withan affine transformation over a period of movement subject toa relaxed geometric invariance defined by a segmental distor-tion ratio �s . We name this process motion verification (Sec-tion 4.3.1). If a segment cannot be identified or the segmentidentification cannot be proved correct by motion verification,reflecting poor segment data in the current key-frame, the al-gorithm shifts the key-frame forwards a certain time periodto attempt re-identification. This process is denoted dynamickey-frame-shift identification (Section 4.3.2). The final phase,backward parent-segment correction, aims to correct anywrongly identified parent-segment which could cause subse-quent unsuccessful matches of child-segments (Section 4.3.3).

Page 4: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 421

In the dynamic process, a temporal key-frame shift is alwaysaccompanied by a recruitment procedure that forward propa-gates already obtained identities and recruits any newly appear-ing matches in order to maintain spatial hierarchy.

4. The DSHPM algorithm

Identification is carried out hierarchically segment by seg-ment in a chosen key-frame containing e.g. over 90% of themodel points (Section 4.2), or in a key-frame range when nec-essary, taking advantage of a dynamic scheme (Section 4.3). Inorder to make temporal coherence of motion cues exploitableand reduce the search space, feature-point pre-tracking and pre-segmentation are carried out prior to segmental identification(Section 4.1).

4.1. Pre-tracking and pre-segmentation

Feature-point data are tracked in a time period before identi-fication. We denote this step as pre-tracking in Fig. 1. This pro-cess allows propagating inter-frame correspondences of featurepoints in a key-frame backwards and forwards along stackedpre-tracked trajectories. It not only makes use of motion co-herence for efficient segment retrieval, but also makes the key-frame-based identification feasible.

The pre-tracked trajectories exhibit relative motion cues ofindividual points. To reduce the search space of a segment,a pre-segmentation process is carried out prior to segmentalidentification, as shown in Fig. 1. We group unidentified pointsthat maintain relatively constant distances during articulatedmovements, as candidates for intra-segmental membership.

The pre-segmentation is subjected to criteria that depend onthe Euclidean distance Dt

i,j between each pair of observed data(qi , qj ) at frames t=K+n�, n=0, 1, . . . , 10, starting from thekey-frame K and proceeding in intervals �, where � denotesthe motion relaxation interval, that is, the number of framesduring which motion relaxation takes place, reflecting notice-able changes in pose. We determine intra-segmental point-paircandidature (qi , qj ) using a two-stage criterion with relaxation:

Dti,j <

(1 + max

s�s

)max

sDs , (1)

maxn(DK+n�i,j ) − minn(D

K+n�i,j )

avgn(DK+n�i,j )

< maxs

�s , (2)

where the segment distortion ratio �s is determined by relativevariation of edge length, among segmental point-pairs, reflect-ing allowed “non-rigidity” of a segment in motion.

Criterion (1) indicates that the point-pair distance Dti,j should

be less than the maximum intra-segmental point-pair distanceDs with maximum distortion relaxation. Criterion (2) requiresthat the ratio between the extremal distance difference and theaverage distance of the point-pair, over the intervals, should beless than the maximum distortion ratio allowed in any segment.If both criteria are satisfied, indicating that point-pair (qi , qj )

maintains a consistent distance with allowed relaxation in

articulated movements and may therefore belong to a samesegment, we store this information in a segmentation matrixSeg and set Seg(i,j) = 1; otherwise we set Seg(i,j) = 0 for anextra-segment pair.

4.2. CT-based iterative segmental identification

Articulated motion maintains a relaxed geometric invariancein near-rigid segments. Matching at segment level is thereforepreferable to brute global point-to-point searching. Initially nei-ther correspondence nor motion transformation are known forany segment or point. To identify a segment Ps in an articu-lated structure at tracking start, we adapt the basic idea of CTfor pose identification developed in our pervious work [35,37],to the new context of motion sequence. We enhance the staticCT-based iterative segmental matching by exploring motion co-herence embedded in inter-frames.

Briefly, a CT is created in two stages: (1) CT generationand optimisation augmented by pre-segmentation information(Section 4.1), and (2) CT-based iterative matching (Section 4.1).

4.2.1. CT generationAs explained in Refs. [35,37], the CT of segment Ps is deter-

mined by intra-segmental distance similarity, here augmentedwith heuristic rigidity cues available from pre-segmentation. Todefine the column ordering of a CT for a segment in which nopoint has been identified, arbitrarily choose a “pivot” referencepoint ppivots , and order the remaining model points pi by non-decreasing distance Dpivots ,i from the pivot, giving a modelpivot sequence for the segment.

To match the model pivot sequence, a sequential search isapplied, with the possibility of rapid rejection of false candi-dates. Thus, from the unidentified data at key-frame K, arbi-trarily choose an assumed pivot match qK

apivotsof ppivots , and

calculate its distance DKapivots ,j

to all other unidentified points

qKj .To exclude large outliers of the segment based on the cho-

sen pivot, a pivot-centred bounding box, relaxed by distortionratio �s , is applied [35]. Then in the bounding subspace, thealgorithm seeks match candidates of each model point based ondistance similarity with reference to the assumed pivot qK

apivots,

satisfying the distortion tolerance,

|Dpivots ,i − DKapivots ,j

|Seg(apivots ,j)

=1|Dpivots ,i

< �s , (3)

in which the candidate selection is restricted by the pre-segmentation point-pair rigidity criterion Seg(apivots ,j) = 1.We list such selected candidates in a table column as possiblematches. The procedure is repeated for every element alongthe model pivotal sequence, giving rise to an ordered matchingsequence of columns that define the CT for the assumed pivotmatch (ppivots , qK

apivots).

Taking each unidentified point in turn as an assumed pivotmatch, we generate a set of CTs for that model pivot choice.

Page 5: Articulated motion reconstruction from feature points

422 B. Li et al. / Pattern Recognition 41 (2008) 418–431

Heuristically, the CT constructed with the correct pivot match,if present in the data, should include more candidates than otherCTs. To economise the iterative search at next stage (Section4.2.2), CT prioritising by CT-culling, CT-ranking and candidateordering are applied to reduce the search space in which thecorrect solution is likely to be found, as discussed in Ref. [35].The use of CTs makes the assumption of small motion or posesimilarity, used e.g. in the ICP [36], unnecessary.

When a join point has already been identified during itsparent-segment identification, then this point is chosen as theunique pivot. In this case, only one CT is generated, resultingin a striking reduction of search space.

4.2.2. Iterative segmental matchingTo detect the presence or absence of a one-to-one intra-

segmental feature-point correspondences in a CT of the reducedset of prioritised CTs, an iterative matching procedure is em-ployed to seek the best transformation that interprets segmen-tal movement under distortion relaxation. We take first can-didates in the top row of a CT to provide an assumed ini-tial match QK

s of model segment Ps . An affine transformation[Rs , Ts] is determined under this correspondence by a least-square SVD method [38]. If this transformation maps modelpoints into a rough alignment with their assumed matches, sothat the average mapping error satisfies the desired matchingquality bounded by the segmental distortion ratio,

e(Ps→QK

s )|[Rs ,Ts ] < �s , (4)

then the assumed segmental match (Ps → QKs ) is taken as

correct. Otherwise, there must be pseudo pairs in the matchingassumption which need to be removed by coarse-to-fine error-driven iteration. Pseudo pairs exaggerate individual matchingerrors at wrong matches. Based on this cue, we remove theworst match by replacing it with its next candidate in the CT, if itexists; otherwise we omit its match from the currently assumedcorrespondence in the CT. If this CT becomes exhausted beforethe best segment match is found, then a new CT would beinterrogated [37].

To qualify as a whole segmental match, the transformationunder the assumed correspondence should also guarantee thedesired matching quantity fraction �s ,

number(Ps → QKs )|[Rs,Ts ]

Ms

> �s , (5)

where Ms is the number of feature points in model segmentPs . If the matching quantity criterion is not satisfied, the algo-rithm will attempt to find the remaining matches of the segment,which may have been dropped during iterations, or were ex-cluded from the CT on grounds of limiting the search space viastringent criteria equations (1)–(3). Finding remaining matchesis achieved by reassigning their nearest-neighbours in the dataunder current transformation [Rs , Ts] (refer to segment recruit-ment shown in Fig. 3). If no such closest neighbour is found,we say the match of the point is lost.

Iterative motion estimation and refinement alternately updatethe assumed matches until converging to a segmental match

Fig. 2. Motion verification.

(Ps → QKs ) in the correct CT, leading to satisfaction on both

matching quality equation (4) and matching quantity equation(5). In the event of no CT providing an acceptable match, thesegment match will be deemed not to exist in the current key-frame.

In the case of segments with less than three matching pairs,the SVD-based motion estimation cannot be applied. Segmentalidentification becomes highly uncertain. We need to confirmsuch a segment in the hierarchical chain depending on whetherits children or even grandchildren can be found (Section 4.4).

4.3. Dynamic identification

Single-frame spatial pose data alone may have inherent un-certainty in determining the correct match from noisy data. Inthe dynamic context of a motion sequence, geometric coher-ence embedded in the temporal domain along pre-tracked tra-jectories is used to improve the reliability of identification froma key-frame. As shown in Fig. 1, after the CT-based iterativesegmental matching (Section 4.2), a dynamic scheme of motionverification (Section 4.3.1), dynamic key-frame-shift identifica-tion (Section 4.3.2) and backward parent-segment correction(Section 4.3.3) is applied in a key-frame range to guarantee amost likely correct segment identification.

4.3.1. Motion verificationIn segment-based articulated motion, geometric invariance

of segmental “rigidity” should be maintained over movements.The idea of motion verification is therefore to propagate thesegmental feature-point identities along their pre-tracked tra-jectories, and to confirm an affine transformation under suchcorrespondence still satisfying the matching quality criterionequation (4) within an allowed distortion relaxation. As sum-marised in Fig. 2, the segmental matching obtained in the key-frame should be confirmed via its “rigidity”, after a motionrelaxation interval �-frame-shift of the key-frame.

When the key-frame segment match (Ps → QKs ) is con-

firmed, the algorithm will attempt to retrieve newly appearingpoints at the observed frame K + � if the segment is incom-plete. This is achieved by the segment recruitment procedure asshown in Fig. 3. If at the observed frame K+�, more matchesare found, reflecting good data quality, then the dynamic iden-tification scheme favours a key-frame-shift to K + �, to bedescribed below.

Page 6: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 423

Fig. 3. Segment recruitment.

Fig. 4. Recursive Parent_segment_correction.

4.3.2. Dynamic key-frame-shift identificationThe quality of some frame data for a segment may be very

poor, on account of excessive missing/extra data or distortion.In this case, CT-based segmental identification may fail. Thiswill break off the hierarchical search and result in serious un-certainty for successive child-segment identification. For thisreason, we do not confine the segment matching to the initiallychosen key-frame, but rather carry out matching in a key-framerange. If the segment identification or verification fails, then thedynamic key-frame-shift module is used to re-identify the seg-ment in up to two successive key-frame-shifts, as shown Fig. 1.

In order to maintain spatial hierarchy, after the key-frame-shift, the recruitment procedure in Fig. 3 is applied to all in-completely identified segments to encompass any previouslymissed matches and forward propagate the obtained segmentalidentities into the new key-frame.

4.3.3. Recursive parent-segment correctionIf two successive key-frame-shift processes still fail to iden-

tify or verify a segment Ps , this may imply a wrong or highlydistorted joint pivot in use, derived from its parent-segment dur-ing hierarchical searching. In this case, the algorithm attempts arecursive backward parent-segment correction to check for thejoin point and even its parent-segments, as described in Fig. 4.

If after a series of dynamic attempts, no parent join takes partin the failed identification, we ultimately abandon the segment.

This indicates that the segment could have been occluded, orhave poor data quality even over a range of investigated frames.

4.4. Integrating temporal coherence with spatial hierarchy forarticulated match

Articulation at inter-segment joins is represented in a tree hi-erarchy. Consistent matching of articulated segments is carriedout with respect to this tree. Such spatial hierarchy organisessegment identification in a parent–child ordering, thereby car-rying forward identified join points in a parent to its children.

We assume that one of the segments of the articulated struc-ture contains more points and has more segments linked to itthan most other segments. We treat such a segment as root,seeking to identify it first. After the root has been located,searching proceeds depth-first to child-segments along hierar-chical chains, taking advantage of available joints which havebeen located during parent-segment identification. This linkagethrough join points considerably increases the reliability andefficiency of child-segment identification. In the case of miss-ing joint data on a parent segment, we recover a virtual joint ifat least three identified points are obtained in the parent. Whena parent has several children, searching prioritises to the childwith the most model points, as its identification incurs the leastuncertainty from missing, extra or distorted data, and leads tothe greatest subsequent search space reduction. In the case ofbroken search chains in the hierarchy due to a failed segmentidentification or missing join point, identification will proceedto other segments on other chains first and leave any remain-ing child-segments on broken chains to be identified last underconditions of a much reduced search space.

The segment-based hierarchical search operates dynamicallyin a key-frame range, rather than being confined to a singlestatic data frame as in Refs. [35,37]. This lends robustness tosegment identification, as data maybe poor in one frame whilegood in another. When the hierarchical chain is broken in achosen frame, the algorithm could shift to a new frame to carryon the search. Most existing feature-point identifications willbe carried forwards along pre-tracked trajectories to propagatespatial continuity into the new frame. An obtained segmentmatch can be confirmed by geometry invariance presentedin the temporal domain of a motion relaxation interval �(Fig. 2). A failed child-segment identification caused by awrongly inherited pivot from its parent can be corrected by arecursive procedure (Fig. 4). Meanwhile, the dynamic schemeallows an efficient recruitment of reappearing segment pointsby reference to the known points whose identities are carriedforward (Fig. 3). Our experimental results confirm that tem-poral coherency integrated with spatial hierarchical cohesionenhances identification of complex articulated motion in thepresence of data corruption (Section 5).

4.5. Identification with tracking for registering a wholemotion sequence

After segment-based dynamic hierarchical matching in a key-frame range, the identity of each feature point can be propagated

Page 7: Articulated motion reconstruction from feature points

424 B. Li et al. / Pattern Recognition 41 (2008) 418–431

along its trajectory by inter-frame tracking. The algorithm con-tinues to track identified points forwards throughout the wholemotion sequence until missing data are encountered, causingbroken trajectories. For missing data, the algorithm attempts toidentify reappearing points or even segments and restart theirtracking. This is much easier by reference to the already identi-fied points than at the initial stage of model fitting, because themotion transformation can be obtained from partially identifiedsegment matches. In the case of an entire newly appearing seg-ment, identification complexity is evidently much reduced inthe presence of previously established correspondences in thearticulation.

5. Experiments

The algorithm has been implemented in Matlab. We tested iton articulated models, such as human and robot manipulatorsin various low densities and distributions of feature points. Hu-man motion representing a typical articulated motion with seg-ments of only near-rigidity makes the identification task moredifficult than in the case of robot manipulator motion with rigidsegments. To reflect this challenge, we report in this section ex-perimental results on real-world human motion capture and itsoverlays with synthetic noise for performance analysis of thealgorithm.

In our experiments, all 3D model data and human motiondata were acquired via a marker-based optical MoCap Vicon512 system [34]. The measurement accuracy of this system isto the level of a few millimetres in a control volume spanningmetres in linear extent. We attached markers as extrinsic featurepoints on a subject at key sites, indicating segmental structurewith required detail. Marker attachment to tight clothing orbony landmarks nevertheless introduced inevitable segmentalnon-rigidity due to underlying soft body tissues. The samplingrate of human MoCap was 60 frame per second (fps) in ourexperiments.

5.1. Human motion reconstruction from MoCap data

A number of freeform movements captured from varioussubjects in various point distributions were investigated. Thesubject model is first generated off-line using one completeframe of feature-point data captured in a clear pose. We manu-ally identified the data in a 3D-interactive display and groupedthem into segments consistent with the underlying articulatedanatomy. This produced a “stick-figure” skeleton model of thesubject, as shown in the first of the figure sequences in Fig. 6(a)and (b). Having attached markers to the subject and defined itsmodel, we proceeded with the capture of the subject’s freeformmotion using the Vicon MoCap 512 system.

5.1.1. Parameter settingThe segmental distortion ratio �s in Eqs. (1)–(4) and match-

ing quantity �s in Eq. (5) were pre-defined according to thesegment rigidity and the quality of MoCap data. Precise valuesof the parameters are not required a priori, but algorithmic per-formance will be compromised by very inappropriate choices.

Fig. 5. Capture static pose data for subject model generation.

To provide experimental values of �s , we analysed a number ofdynamic trials. We found that segmental distortion differs withdifferent body parts and motion intensity. Thigh segments maygive rise to large distortion with �s ≈ 0.2. A value of �s ≈ 0.05was found to be adequate for indicating the rigidity of the head,and an average �s ≈ 0.1 for other body parts. We used one setof approximate distortion ratios �s =.05..2 in all human motionexperiments (Fig. 5).

For a rigid segment, a small value of �s guarantees a pre-cise matching quality and provides good ability of rejectingoutliers. Therefore, we could reduce the matching quantity re-quirement �s to gain more tolerance on missing data. For a de-formable segment, a high �s value allows increased distortion,but at the cost of increased candidate search space and possiblelow-matching quality. For compensation, we have to raise thematching size requirement �s with possible compromised han-dling of missing data. Based on the quality of MoCap data andthe rigidity of individual segments, we set the matching quan-tity �s = .80..90. To reflect significant pose changes, we chosea motion relaxation interval � = 15 frames, corresponding to0.25 s at the MoCap rate of 60 fps.

5.1.2. Reconstruction resultsIllustrative results of identified MoCap sequences from rep-

resentative full-body models of 27, 34 and 49 feature points in15 segments are given in Table 1 and in Fig. 6. In Fig. 6, iden-tified feature points are linked intra-segmentally. As shown inFig. 6, there is no assumption of pose similarity between themodel (the frame first in Fig. 6(a), (b)) and their motion se-quences, initially or during the movements. The captured mo-tion sequences were subject to inevitable intermittent missing

Page 8: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 425

Table 1Identification examples of human motion (MoCap rate at 60 fps)

Activity Sequence length Number of Identification Efficiency (identifiedin frames (seconds) trajectories rate (%) frames per second)

Protocol 1: 27 feature pointsWalking 600 (10) 32 96 300Running 600 (10) 40 95 270Freeform movement 1200 (20) 93 94 220

Protocol 2: 34 feature pointsWalking 600 (10) 45 98 380Running 600 (10) 54 97 320Freeform movement 1200 (20) 106 94 250

Protocol 3: 49 feature pointsWalking 600 (10) 56 96 280Running 600 (10) 65 94 220Freeform movement 1200 (20) 128 91 130

points, extra noise points and segmental distortion in complexintensive motion. We observe that the proposed DSHPM algo-rithm is capable of reconstructing an articulated motion rep-resented by sparse or moderately dense feature points in thepresence of data noise. Even when some key points, such asjoin points, or segments have been lost, the algorithm can stillcarry on the identification process by taking advantage of theproposed dynamic scheme.

This algorithm has been successfully used to identify a num-ber of MoCap trials in a commercial project to produce thegame “Dance: UK”.1 Extracts from an identified dance se-quence are shown in Fig. 7.

5.1.3. Performance analysis from the MoCap dataTo demonstrate the performance of the DSHPM algorithm,

Table 1 gives results of human motion identification obtainedby applying the DSHPM algorithm on real-world MoCaptrials. The average missing and extra data in the real-worldMoCap examples is about 10–15%. Some general types ofactivities, listed in the first column, were tested under threetypes of marker protocol: 27, 34 and denser 49 feature points.The freeform movement includes walking running, jumping,bending and dance.

The results shown in each row of Table 1 were averagedfor a number of trials from different subjects preforming sim-ilar movements and with the same marker protocol. The av-erage length of each type of activity, measured in frames asshown in column 2, gives an indication of absolute motion pe-riod at the 60 fps MoCap rate. The sequence length by itselfdoes not always indicate the difficulty of identification. In themost favourable case of no tracking interruption, each featurepoint need be identified only once, allowing its identity to bepropagated along its trajectory with minimum re-identificationcost. However, a trial with occlusions in complex movementswill lead to increased computational cost, as each reappearingfeature point is subjected to identification after tracking loss.

1 The “Dance: UK” was developed in collaboration with BroadswordInteractive Ltd [38,39]. It was released during Christmas 2004.

To indicate identification workload due to lost tracking, col-umn 3 gives the number of trajectories, counting interruptions.These are generally higher than the indicated number of featurepoints, consistent with increasing identification difficulty.

Activities (first column) for each marker protocol are or-dered by increasing movement complexity, accompanied by in-creased identification difficulty due to more missing data andextra noise data. This is reflected by the decreasing identifica-tion rate in column 4. The identification rate is defined as thepercentage of the number of correctly identified trajectories inrelation to the total number of trajectories encountered. Thehigh trajectory-based rate emphasises correct identification ob-tained via segment-based articulated matching, illustrating theeffective nature of the algorithm, rather than identification in-herited only from inter-frame tracking, thus illustrating the ef-fective nature of the algorithm.

We observed that the identification rate in Table 1 is in excessof 90% in all motion types, whether with sparser or denserfeature points, even for large accelerative movements with bigsegmental distortion, such as in jumping and dancing, and forcomplex movements characterised by large numbers of brokentrajectories due to occlusion.

Reconstruction efficiency of the algorithm depends on thecomplexity of the articulated model, but critically also on themotion conditions: the level of segmental distortion associatedwith movement intensity, the frequency and amount of miss-ing/extra data associated with motion complexity. We indicatedan empirical reconstruction efficiency via “identified frames persecond” in Table 1. This indicator is defined as the length ofa trial (measured in frames) divided by the computational time(measured in seconds) when the DSHPM identification was ex-ecuted on a Compaq Pentium IV with 512 MB RAM in Matlabcode. We observe that for common activities of walking andrunning, the identification efficiency in the type 2 marker pro-tocol (34 feature points) is higher than for types 1 and 3 (27and 49 feature points, respectively). Type 2 is a compromisebetween having too few feature points (type 1) to allow unin-terrupted hierarchical searching in the case of missing data, andthe denser data sets (type 3), with possible identification con-

Page 9: Articulated motion reconstruction from feature points

426 B. Li et al. / Pattern Recognition 41 (2008) 418–431

–400–2000

200–1000

–5000

0

500

1000

1500

model

–600–400

–2000

200

–2000

200

400

0

500

1000

1500

Frame=50

–1000

–500

00 200

400600

0

500

1000

1500

Frame=200

–400–200

0200

–2000

200400

600800

0

500

1000

1500

Frame=350

–400–2000

200

–400200

0200

400600

0

500

1000

1500

Frame=500

–1000

–500

0

0

500

1000

0

500

1000

1500

Frame=650

–2000200400600

0

500

1000

0

500

1000

1500

Frame=800

0200

400600

0200400

600

0

500

1000

1500

Frame=950

0500

10000

500

0

500

1000

1500

model

0 500 10000

5000

500

1000

1500

Frame=60

0500 0500

0

200

400

600

800

1000

1200

1400

1600

Frame=120

0500

1000 0200

400600

8000

500

1000

1500

Frame=180

0500

10000200400600800

0

500

1000

1500

Frame=240

05001000 05000

500

1000

1500

Frame=300

0 200400600 800

0200

400600

0

500

1000

1500

Frame=360

0500

10000

5000

500

1000

1500

Frame=420

Fig. 6. Reconstructed human freeform movements: subject models of 15 segments followed by 7 sampled frames from their identified motion sequences: (a)human motion represented by 34 feature points; (b) human motion represented by 49 feature points.

Page 10: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 427

Fig. 7. Dance trial reconstruction in the game project “Dance: UK”.

Page 11: Articulated motion reconstruction from feature points

428 B. Li et al. / Pattern Recognition 41 (2008) 418–431

0 0.09 0.18 0.27

additional distortion level

0 0.09 0.18 0.27

additional distortion level

0.5

0.6

0.7

0.8

0.9

1

Identification r

ate

0.5

0.6

0.7

0.8

0.9

1

Identification r

ate

30 points

50 points

30 points

50 points

Fig. 8. Comparison of the static approach [35] with the DSHPM approach for motion reconstruction with additional synthetic distortion: (a) static identification;(b) dynamic identification.

fusion and general increased computation cost. For each typeof marker protocol, identification efficiency decreases with in-creasing activity complexity involving more broken trajecto-ries and data noise. In all cases, the identification efficiencyexceeded the MoCap rate 60 fps by at least two times, makingidentification time competitive with real motion time. This sug-gests reconstruction for an on-line tracker realised in bufferedreal-time.

5.2. Evaluation based on synthetic distortion of real data

To evaluate the robustness and efficiency of the DSHPM al-gorithm, we used two MoCap walking sequences. Each of themhas 600 frames corresponding to 10 s at a MoCap rate 60 fps.They were captured from the same subject for comparability,in the sparse case of 30 points and the denser case of 50 points,respectively. Both sequences are denoted as “ideal”, having nomissing or extra data, and minimal distortion by marker attach-ment to the subject at tightly clothed key sites. Their identifi-cation rate by the DSHPM is 100%.

5.2.1. Identification of distorted motion data compared todynamic and static schemes

In the first series of experiments, we compared identifica-tion effectiveness by the proposed dynamic strategy with thatof a static identification scheme [35], under increasing mo-tion distortion. In the latter, identification is carried out inisolation at each frame, without considering any inter-frametemporal coherence. To simulate the effect of distortion un-der variable motion intensity, we augmented the “ideal” mo-tion data with synthetic noise over its natural distortion, asfollows. Taking a pre-identified “ideal” walking sequence, weadded Gaussian noise N(0, 0.5�ls )/

√6 to the x, y and z co-

ordinates of each point in sequences of over 600 frames, thestandard deviation being parameterised by a dimensionless dis-tortion level � and an average segmental length ls . The aver-age identification rates (fraction of correctly identified points)over 500 trials, versus the increasing distortion level of bothsparser 30 and denser 50 feature-point walking trials, usingeither the static or dynamic identification scheme, are givenin Fig. 8(a) and (b).

We observe that increased distortion leads to more potentialfor confusion among neighbouring data points, especially for

the denser point-set, and to greater loss of identification andtracking difficulty. However, comparing Fig. 8(a) and (b), theDSHPM algorithm achieves better identification rates than thestatic method, at a given distortion level. This is more evidentwith increasing distortion, as is to be expected from the DSHPMrobustness gained by exploiting motion coherence embeddedin inter-frames to survive spatial distorted data. The advantageof using DSHPH is more obvious for the difficult situation ofthe denser set.

5.2.2. Identification of corrupted data with missing/extra dataThe second series of experiments studied the ability of the

DSHPM algorithm to identify “ideal” walking sequences sub-jected to increasing missing or extra data. To obtain test motionsequences with missing data, we removed feature-point datarandomly and evenly among segments, with gaps continuingfor 1–60 frames and average length Lcorrupt = 30 frames. Togenerate an extra trajectory in a volume encompassing the ob-served data, we randomly inserted two points, one in each oftwo frames with 1–60 frames apart, and linearly interpolatedthe trajectory in-betweens. The average length of an extra tra-jectory is therefore Lcorrupt = 30 frames. The fraction of suchcorrupted (missing or extra) data is defined as (Lcorrupt/L) ×(Ncorrupt/N), in which Ncorrupt denotes the number of miss-ing or extra trajectories generated, where L is the framelength of the sequence and N is the number of model featurepoints.

Average identification rates of 500 trials at different data cor-ruption levels are shown in Fig. 9. Fractions of missing or ex-tra data added to the “ideal” walking sequences are indicatedby the bottom horizontal axis. The corresponding numbers ofbroken trajectories encountered at each missing or extra noiselevel, for 30 (and 50) point trials, are given along the top axis.Comparing the left and right of the zero-line, the latter lineindicating identification of the “ideal” sequences with original30 or 50 trajectories, we observe that the algorithm demon-strates good robustness in rejecting large numbers of outliers,but rapidly fails to survive the inherent difficulty of increasingmissing data. It is also evident that the denser set gains bettertolerance on missing data than the depleted sparser point-set.However, more identification loss happens to the denser set forextra data.

Page 12: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 429

Num. of trajectories for sparse (30) and dense (50) point sets

-0.2

4

-0.1

8

-0.1

2

-0.0

6 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

174 (290)

138 (230)

102 (170)

66

(11

0)

30

(50

)

90

(15

0)

150 (250)

210 (350)

270 (450)

330 (550)

390 (650)

450 (750)

510 (850)

570 (950)

630 (1050)

0.5

0.6

0.7

0.8

0.9

1

1.1

Fraction of missing (-) and extra (+) data

Iden

tifi

cati

on

rate

30 points

50 points

Fig. 9. DSHPM identification with synthetic missing and extra data added in the “ideal” walking movements.

Num. of trajectories for sparse (30) and dense (50) point sets

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.10

-0.0

6

-0.1

2

-0.1

8

-0.2

4

630 (

1050)

570 (

950)

510 (

850)

450 (

750)

390 (

650)

330 (

550)

270 (

450)

210 (

350)

150 (

250)

90 (

150)

30 (

50)

66 (

110)

102 (

170)

138 (

230)

174 (

290)

10

100

1000

Fraction of missing (-) and extra (+) data

SV

D-c

ou

nt

30 points

50 points

Fig. 10. SVD-count versus missing and extra data.

5.2.3. ComplexityDuring dynamic hierarchical matching, the identification

step is computationally more intensive than inter-frame track-ing. Motion transformation [Rs , Ts] calculated by the SVD[38] under an assumed segmental correspondence is themost time consuming step, and is invoked in most essen-tial modules, e.g. CT-based iterative segment match, motionverification and recruitment. Therefore, in the last series ofexperiments, we attempted to measure an empirical com-plexity via the total number of SVD invocations, denotedby SVD-count.

We undertook such a complexity analysis relating to the twoexperiments of Section 5.2.2 above. SVD-counts versus miss-ing/extra data in walking sequences, for both cases of sparserand denser point sets, are monitored in Fig. 10. In both cases, weobserve that SVD-counts grow steadily with increasing extradata in an approximate log-linear manner. Most SVD-counts arespent at the initial key-frame identification stage. Ideally, whenall segments are identified without missing data, any of outliersneed only to be tracked without further identification cost. Onthe left side of zero-line in Fig. 10, SVD load increases rapidlywith increasing missing data. However, the growth tendency is

Page 13: Articulated motion reconstruction from feature points

430 B. Li et al. / Pattern Recognition 41 (2008) 418–431

restrained toward higher fractions of lost data. This is becauseon the one hand, incomplete data causes more identificationand verification difficulties during initial segmental matching,and the recruitment function that invokes the SVD is requiredto encompass any newly appearing matches; on the other hand,missing data reduce the number of points to be identified andraise the possibility of segment abandon. Comparing the denserand sparser cases, all have the same number of segments, butthe denser set will lead to more populated CTs. This is likely torequire greater numbers of match attempts, but the overall com-plexity is seen to grow only at some low power of the extra datameasure.

6. Conclusion

The proposed dynamic segment-based hierarchical pointmatching (DSHPM) algorithm addresses a general and cur-rently open problem in pattern recognition: non-rigid articu-lated motion reconstruction from low-density feature points.The algorithm has a crucial self-initialisation phase of poseestimation, benefiting from our previous work [35,37]. In thecontext of a dynamic sequence, the DSHPM algorithm inte-grates a key-frame-based dynamic hierarchial matching withinter-frame tracking to achieve computation efficiency androbustness to data noise. The candidate table optimisationheuristics are improved by exploiting geometric coherencyembedded in inter-frames. Segment-based articulated match-ing along a spatial hierarchy is significantly enhanced bya dynamic scheme, in the forms of motion-based verifica-tion, dynamic key-frame-shift identification and backwardparent-segment correction. Performance analysis of the al-gorithm using synthetic data demonstrates the effectivenessof the dynamic scheme that ultimately determines the ro-bustness of articulated motion reconstruction and reducesthe uncertainty inherent in the matching problem for singleframe.

We provided illustrative experimental results of human mo-tion reconstruction using 3D real-world MoCap data. Identifica-tion rates for most common freeform movements have achieved90% or higher without requiring manual intervention to aid theidentification. Identification efficiency proceeded at over twicethe common MoCap rate of 60 fps. This suggests the DSHPMalgorithm has the potential for self-initialising point-featuretracking and identification of articulated movement in real-timeapplications.

Acknowledgements

All model and motion data used in our experimentswere obtained by a marker-based optical motion capturesystem—Vicon-512, installed at the Department of ComputerScience, UWA. Some motion trials analysed in this paper werecaptured for the game project “Dance: UK” in collaborationwith Broadsword Interactive Ltd. [39].

References

[1] J.K. Aggarwal, Q. Cai, W. Liao, B. Sabata, Articulated and elastic non-rigid motion: a review, In Proceedings of the IEEE Workshop on Motionof Non-Rigid and Articulated Object, Austin, TX, 1994, pp. 2–14.

[2] J. Maintz, M. Viergever, A survey of medical image registration, IEEEEng. Med. Biol. Mag. 2 (1) (1998) 1–36.

[3] J. Deutscher, A. Blake, I. Reid, Articulated body motion capture byannealed particle filtering, in: Proceedings of the IEEE InternationalConference on CVPR, vol. 2, 2000, pp. 126–133.

[4] T. Moeslund, E. Granum, A survey of computer vision-based humanmotion capture, Comput. Vision Image Understanding 81 (3) (2001) 231–268.

[5] L. Wang, W. Hu, T. Tan, Recent developments in human motion analysis,Pattern Recognition 36 (3) (2003) 585–601.

[6] C. Cédras, M. Shah, A survey of motion analysis from moving lightdisplays, in: Proceedings of the IEEE Computer Vision and PatternRecognition, Washington, June 1994, pp. 214–221.

[7] C. Taylor, Reconstruction of articulated objects from pointcorrespondences in a single uncalibrated image, Comput. Vision ImageUnderstanding 80 (3) (2000) 349–363.

[8] J. Zhang, R. Collins, Y. Liu, Representation and matching of articulatedshapes, in: Proceedings of the IEEE International Conference on CVPR,vol. 2, 2004, pp. 342–349.

[9] I. Cox, S. Hingorani, An efficient implementation of Reid’s multiplehypothesis tracking algorithm and its evaluation for the purpose of visualtracking, IEEE Trans. Pattern Anal. Mach. Intell. 18 (2) (1996) 138–150.

[10] S. Deb, M. Yeddanapudi, K. Pattipati, Y. Bar-Shalom, A generalized S-Dassignment algorithm for multisensor–multitarget state estimation, IEEETrans. Aerosp. Electron. Syst. 33 (2) (1997) 523–538.

[11] C. Veenman, M. Reinders, E. Backer, Resolving motion correspondencefor densely moving points, IEEE Trans. Pattern Anal. Mach. Intell. 23(1) (2001) 54–72.

[12] Y. Wang, Feature point correspondence between consecutive frames basedon genetic algorithm, Int. J. Robot. Autom. 21 (2006) 2841–2862.

[13] M. Ringer, J. Lasenby, Modelling and tracking articulated motion frommultiple camera views, in: Proceedings of the British Machine VisionConference, Bristol, UK, September 2000, pp. 172–182.

[14] B. Li, H. Holstein, Dynamic segment-based sparse feature-pointmatching in articulate motion, in: Proceedings of the IEEE InternationalConference on Systems, Man and Cybernetics, 2002.

[15] R. Campbell, P. Flynn, A survey of free-form object representation andrecognition techniques, Comput. Vision Image Understanding 81 (2001)166–210.

[16] B. Li, Q. Meng, H. Holstein, Point pattern matching and applications—areview, in: Proceedings of the IEEE International Conference on Systems,Man and Cybernetics, Washington, DC, USA, October 2003.

[17] V. Gaede, O. Günther, Multidimensional access methods, ACM Comput.Surv. 30 (2) (1998) 170–231.

[18] D.M. Mount, N.S. Netanyahu, J.L. Moigne, Efficient algorithms forrobust feature matching, Pattern Recognition 32 (1999) 17–38.

[19] H.J. Wolfson, I. Rigoutsos, Geometric hashing: an overview, IEEEComput. Sci. Eng. 4 (1997) 10–21.

[20] W.E.L. Grimson, T. Lozano-Perez, D. Huttenlocher, Object Recognitionby Computer: The Role of Geometric Constraints, MIT Press,Cambridge, MA, 1990.

[21] E. Bardinet, L.D. Cohen, N. Ayache, A parametric deformable model tofit unstructured 3D data, Comput. Vision Image Understanding 71 (1)(1998) 39–54.

[22] A. Cross, E. Hancock, Graph matching with a dual-step EM algorithm,IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1236–1253.

[23] H. Chui, A. Rangarajan, A new point matching algorithm for non-rigidregistration, Comput. Vision Image Understanding 89 (2003) 114–141.

[24] A. Pitiot, G. Malandain, E. Bardinet, P. Thompson, Piecewise affineregistration of biological images, in: Second International Workshop onBiomedical Image Registration, 2003.

[25] G. Seetharaman, G. Gasperas, K. Palaniappan, A piecewise affine modelfor image registration in nonrigid motion analysis, in: Proceedings of

Page 14: Articulated motion reconstruction from feature points

B. Li et al. / Pattern Recognition 41 (2008) 418–431 431

the IEEE International Conference on Image Processing, 2000,pp. 1233 –1238.

[26] D. Forsyth, D. Ramanan, C. Sminchisescu, People tracking, in:Proceedings of the IEEE International Conference on Computer Visionand Pattern Recognition, 2006.

[27] D. Gavrila, L. Davis, Model-based tracking of humans in action: amulti-view approach, in: Proceedings of the IEEE Computer Vision andPattern Recognition, San Francisco, 1996, pp. 73–80.

[28] L. Sigal, S. Bhatia, S. Roth, M. Black, M. Isard, Tracking loose-limbed people, in: Proceedings of the IEEE International Conference onComputer Vision and Pattern Recognition, 2004.

[29] X. Lan, D. Huttenlocher, A unified spatio–temporal articulated modelfor tracking, in: Proceedings of the IEEE International Conference onComputer Vision and Pattern Recognition, 2004.

[30] H. Nguyen, Q. Ji, A. Smeulders, Robust multi-target tracking usingspatio–temporal context, in: Proceedings of the IEEE InternationalConference on Computer Vision and Pattern Recognition, 2006.

[31] T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor sceneusing spatio–temporal motion analysis, in: Proceedings of the IEEEInternational Conference on Pattern Recognition, 2004.

[32] L. Kakadiaris, D. Metaxas, Model-based estimation of 3D human motion,IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1453–1459.

[33] J. Richards, The measurement of human motion: a comparison ofcommercially available systems, Human Movement Sci. 18 (5) (1999)589–602.

[34] 〈www.vicon.com〉. Vicon Motion Systems.[35] B. Li, Q. Meng, H. Holstein, Reconstruction of segmentally articulated

structure in freeform movement with low density feature points, Imageand Vision Comput. 22 (10) (2004) 749–759.

[36] P.J. Besl, N.D. McKay, A method of registration of 3-D shapes, IEEETrans. Pattern Anal. Mach. Intell. 14 (2) (1992) 239–255.

[37] B. Li, Q. Meng, H. Holstein, Articulated pose identification with sparsepoint features, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (3)(2004) 1412–1423.

[38] K.S. Arun, T.S. Huang, S.D. Blostein, Least square fitting of two 3-Dpoint sets, IEEE Trans. Pattern Anal. Mach. Intell. 9 (5) (1987) 698–700.

[39] 〈www.broadsword.co.uk〉. Broadsword Interactive Ltd.

About the Author—BAIHUA LI received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degree in computerscience from the University of Wales, Aberystwyth in 2003. She is a Lecturer in the Department of Computing and Mathematics, Manchester MetropolitanUniversity, UK. Her current research interests include computer vision, pattern recognition, human motion tracking and recognition, 3D modelling and animation.

About the Author—QINGGANG MENG received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degreein computer science from the University of Wales, Aberystwyth in 2003. He is a Lecturer in the Department of Computer Science, Loughborough University,UK. His research interests include biologically/psychologically inspired robot learning and control, machine vision and service robotics.

About the Author—HORST HOLSTEIN received the degree of B.S. in Mathematics from the University of Southampton, UK, in 1963, and obtained a Ph.D.in the field of rheology from University of Wales, Aberystwyth, UK, in 1981. He is a Lecturer in the Department of Computer Science, University of Wales,Aberystwyth, UK. His research interests include motion tracking, computational bioengineering and geophysical gravi-magnetic modelling.