Motion Curves: A versatile representation for … › dcs › theses › MSc › 2005-06 ›...

Motion Curves: A versatile representation for motion data

by

Kevin Forbes

A thesis submitted in conformity with the requirementsfor the degree of Master of Science

Graduate Department of Computer ScienceUniversity of Toronto

Copyright c© 2005 by Kevin Forbes

Abstract

Motion Curves: A versatile representation for motion data

Kevin Forbes

Master of Science

Graduate Department of Computer Science

University of Toronto

2005

This thesis presents Motion Curve space: a novel representation scheme for the poses

of an articulated skeletal figure. A Motion Curve space is defined by a set of orthogo-

nal basis vectors that have been found by performing a weighted principal component

analysis on an example motion clip. An animator can control the properties of the space

through the selection of the example clip and the PCA weights. We explore the expres-

sive and computational power of the representation through the creation of several new

motion processing and analysis algorithms, which are demonstrated through prototype

applications. These prototypes help to establish the workflow for a hypothetical produc-

tion application. In presenting this work, we hope to expand the size of the animator’s

toolbox. By providing a new and usable framework for editing motions, we make it

possible to quickly modify existing motion assets and stretch animation budgets.

ii

Acknowledgements

I’d like to thank my advisor, Dr. Eugene Fiume, for his guidence and for giving me the

freedom to pursue my choice of research topics. I’d also like the thank Dr. Karan Singh

for being my second reader.

Science is an inherently collaborative endeavor, and I am indebted to everyone in the

lab who offered suggestions and help along the way. I owe Alex Kolliopoulos a huge

favour at some point for his help with submitting this thesis from a distance. I owe

my wife, Shannon, an even bigger favour for her moral support over the course of this

project.

Finally, I’d like to thank OGS and NSERC for financial support.

iii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Statement of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 7

2.1 Representing Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Skeletal Animation . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 Driving a Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.3 Other Pose Representations . . . . . . . . . . . . . . . . . . . . . 10

2.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Creating Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Keyframing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Rotoscoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.3 Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.5 Digital Puppetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Motion Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Signal Based Techniques . . . . . . . . . . . . . . . . . . . . . . . 17

iv

2.4.2 State Based Techniques . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Motion Segmentation and Recognition . . . . . . . . . . . . . . . . . . . 20

2.6 High Dimensional Data Search Techniques . . . . . . . . . . . . . . . . . 22

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Motion Representation 26

3.1 The Trouble with Motion Data . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Motion Curve Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Constructing the space . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.2 Projections and Unprojections . . . . . . . . . . . . . . . . . . . . 31

3.3 Space Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.1 Pose Distance Metric . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . 33

3.3.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.4 Representational Error . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Pose Detection in Motion Curve Space . . . . . . . . . . . . . . . . . . . 37

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Interpolation 43

4.1 Two-Pose Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 M-way Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Improved Non-overlapping Blends . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Case study - Motion Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Geometric Operations 54

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2 Finding Mean Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 Scaling-Based Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 57

v

5.4 Translation-Based Operations . . . . . . . . . . . . . . . . . . . . . . . . 59

5.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.5.1 A Wavelet Approach to Smoothing . . . . . . . . . . . . . . . . . 62

5.6 Case Study - PCA Explorer . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.7 Extensions: Joint Limits and Selective Blending . . . . . . . . . . . . . . 66

5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 Unsegmented Motion Searching 69

6.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.2.1 Finding the Characteristic Point . . . . . . . . . . . . . . . . . . . 73

6.2.2 Generating Seed Points . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2.3 Seed Point Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2.4 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . 75

6.2.5 Results Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3.3 Motion Capture Data . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.5 Performance Optimization . . . . . . . . . . . . . . . . . . . . . . 84

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7 Conclusion and Future Work 88

7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

vi

7.2.2 New Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2.3 Search Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2.4 Software Development . . . . . . . . . . . . . . . . . . . . . . . . 91

Bibliography 93

vii

Chapter 1

Introduction

1.1 Motivation

Animation provides a powerful and compelling artistic medium. Given complete control

over the canvas, an animator can envision anything from the abstract work of Nor-

man McLaren to the gritty hyper-reality of Linklater’s adaptation of A Scanner Darkly.

Within the frames of moving images, there are no physical constraints limiting what can

be represented. Where sculptors must do battle with gravity, musicians must play within

the range of their instruments, and dancers may bend but not transcend the capabilities

of the human body, the animator, in theory, is only held back by his or her imagination.

That said, animation does in fact have real-world limits. One of the constraints upon

animation is economic, rather than technical. While it is theoretically possible to any

create any sequence of two dimensional images, the types of sequences that are econom-

ically feasible, in terms of available time and expertise, is limited by the expressiveness

of the animators’ tools and the available computational resources.

In the beginning, the only way to create animation was to hand-draw every frame.

Using this solution rapidly becomes infeasible to create an animation of any appreciable

length. Even if a single animator drawing every frame from scratch could meet the

1

Chapter 1. Introduction 2

demands of frame-to-frame consistency and stave off the tedium of re-drawing rarely-

changing objects again and again, s/he could simply not draw quickly enough to complete

a complicated project within a reasonable amount of time.

Traditional animation studios developed a catalogue of techniques to surmount these

limitations. Cell animation separates foreground from the background by placing layers

on transparent sheets, allowing each to animated separately. Keyframing allows a lead

animator to define the flow of an animation with very few drawings, leaving the bulk of

drawing to a team of junior artists. These techniques allow for animation reuse, and for

parallel frame production, both which improve a studio’s throughput.

These techniques carried the animation industry for many decades - from Snow White

to Saturday morning cartoons. But as the twentieth century drew to a close, cheap

computing power and digital storage revolutionized the medium. Computers tend to

change the way in which we do things (not always for the better). In no field is this more

true than in animation.

The invention of the word processor may have changed the interface one uses to write

a novel, but it did not change the actual substance of the activity of writing. Computer

animation, however, is an entirely different medium than its 2D predecessor. With the

shift and inexorable increase in audience expectations, technology and workflow, the

animator’s task has fundamentally changed. The creation of motion data has been freed

from the representation of the character exhibiting the motion. Modelers and texture

painters create detailed three dimensional descriptions of sets, props, and characters that

can be rendered (relatively) quickly, from any angle. An animator typically interacts with

these virtual objects by directly manipulating them, or through procedural methods. The

product of the animator’s labour is no longer a single, concrete representation of a moving

character, but rather, an abstract representation of a character’s movement.

In this way, the task of animation has come to resemble puppetry, but with an im-

portant distinction. The motion that a traditional puppeteer creates is real, and in-the-


moment. It is by nature ephemeral - it is a performance, not a piece. The motion that

a 3D animator creates, in contrast, is an abstract mathematical representation of a mo-

tion. It is data. As such, it can be stored, manipulated, and re-used like any other piece

of data. This presents new opportunities for expression. Much in the way that digital

sampling has expanded the scope of musical expression, digital motion editing and re-use

have the potential to create new ways to work with motion.

If a 3D animator is to fully exploit that medium’s digital nature, he or she will need

two things: a large body of existing motion clips with which to work, and a flexible

representation for the motion that facilitates interesting and expressive operators. The

first requirement can be filled by using motion capture. This thesis seeks to fill the second

requirement.

Motion capture enables the quick recording and representation of subtle, nuanced

physical performance. Unfortunately, the cost of purchasing or renting time with motion

capture equipment is often prohibitive. Techniques that facilitate the synthesis of new

motions from existing motion clips help to alleviate this problem. Animation software lets

the animator to manually edit the individual degrees of freedom of an animation. While

this does permit animation re-use, the process is tedious. Semi-automatic techniques,

which operate over more than one degree of freedom at a time under an animator’s

direction, can be much more useful.

Sequences of motion capture data are usually stored and processed as hierarchical lists

of orientations. Most methods for expressing orientations have undesirable properties,

such as non-euclidean distance metrics or discontinuities, which complicate the treatment

of the data. It would be advantageous to transform the data into a form that is easier

to work with.

As we shall show, it is possible to perform a weighted principal components analysis

on pose data. Projecting the poses of a motion into the resulting Euclidean space results

in a series of points that can be used to define a discrete but explicit path through a high


dimensional space. We reconstruct such paths to curves that we call Motion Curves. This

representation allows for the direct application of techniques from geometric modeling

and signal processing. These techniques can be used to simplify animation tasks, such as

interpolation. They also present new and interesting ways to interact with motion data,

and have been leveraged to create unique motion editing tools.

1.2 Statement of Thesis

This thesis formalizes the Motion Curve representation, and explores the expressive power

of various operations within the Motion Curve space. In doing so, it introduces several

new algorithms for dealing with motion data, including a search algorithm for unseg-

mented motions clips. These algorithms are implemented as standalone prototypes uti-

lizing a common data format. The purpose of these prototypes is to establish the workflow

for a hypothetical production application. The prototypes validate the functionality of

the proposed application.

In presenting this work, we hope to expand the size of the animator’s toolbox. By

providing a new and usable framework for editing motions, we make it possible to quickly

modify existing motion assets and stretch animation budgets. Our techniques can also

be used to modify motions dynamically and continuously, in situations such as games or

real-time visualizations. In this context, our work gives the designer of such a system

meaningful axes for high-level control of animations. It also provides a flexible frame-

work for pose interpolation, which can be integrated with existing blend-based animation

systems.

1.3 Contributions

The main contribution of this thesis is the introduction and characterization of the Mo-

tion Curve representation. The unique characteristics of this representation permit the


development of several useful algorithms for dealing with motion data. We provide both

low-level data manipulation tools, as well as high-level algorithms that leverage the tools.

The major technical contributions include:

• An algorithm for robustly detecting key poses (section 3.4)

• Quick, M-way pose interpolation (section 4.2)

• A prototype motion editing application that implements several unique operators

(chapter 5)

• A search algorithm for unsegmented motion data, which was published as [24].

1.4 Thesis Organization

Chapter 2 presents an overview of the state of art in the various fields this work touches

upon. It begins by discussing the representation of poses in the literature. The standard

skeletal hierarchy model is presented in detail, and other more obscure or specialized

models are mentioned briefly. Next, the various methods used to generate motion data

are explained. We continue by presenting an overview of modern motion processing

techniques, dividing the field into camps: signal-based and state-based. We then survey

some recent works in motion segmentation and automatic recognition, and finish with

a survey of high dimensional data search techniques. The information in this chapter

provides a good sense of context for the ensuing work.

The Motion Curve representation is formalized in chapter 3. A case for Motion Curves

is built first, by discussing existing representations, and enumerating a list of desirable

but as yet unmet characteristics for a motion representation. The steps for constructing a

Motion Curve space are enumerated next. The chapter ends by demonstrating the prop-

erties of the representation, and presenting a method for pose detection using statistical


modeling within the space. This chapter is crucial to the remainder of the thesis, because

all of the techniques developed later depend upon the Motion Curve representation.

Several new results in motion interpolation are presented in chapter 4. The first is

a simple method for multi-way interpolation. Next, we demonstrate linear interpolation

in the Motion Curve space, and compare the results to the standard spherical linear

interpolation result. We also present a method for preserving the appearance of dynamics

when extrapolating through gaps between motion clips. The methods introduced in this

chapter greatly simplify several very important cases of the pose interpolation problem.

In chapter 5, several families of motion editing geometric operations are introduced.

First, a method for finding bounded mean poses is presented. This method is then used

to develop a series of operations based upon scaling and translation which can used to

change the character of regions of motion clips. Examples of edited clips are presented.

In addition, several operations are presented that lack artistic usefulness, but help to

flesh out the space. Finally, the filtering of Motion Curves is discussed, and a wavelet

decomposition model is built. The operators described in this chapter provide enough

functionality for a highly expressive motion editing platform.

An efficient search algorithm for unsegmented motion clips is presented in Chapter

6. This search algorithm finds the regions in a long database clip that are most similar

to a short query clip. The components of the algorithm are first presented in isolation,

then the performance of the resulting system are evaluated through experimentation.

The example-based search algorithm presented in this chapter is useful in its own right,

and is a powerful enhancement to the motion editing platform described in the previous

chapter.

Chapter 7 presents the future work stimulated by this thesis, and draws conclusions

from the results presented in previous chapters.

Chapter 2

Background

In this chapter, we survey the state of the art in animation representations. We begin

by laying down the fundamentals of how poses are stored and manipulated in modern

works. This leads into a discussion of how motion data is represented for editing in both

manual and automatic contexts. Automatic editing contexts often include an element

of pose-based segmentation or recognition, so we outline these areas as well. Finally, we

discuss high dimensional data search techniques, which provides a background for our

work on motion searching.

2.1 Representing Poses

In this thesis, we describe a pose as the instantaneous configuration of an articulated

figure. We only consider the figure’s spatial position- poses are regarded outside of time.

In this section, we describe the most commonly used pose representation in detail, and

briefly discuss other models.

7

Chapter 2. Background 8

2.1.1 Skeletal Animation

Human character animation, when rendered both in real-time or off-line, is usually im-

plemented using a hierarchical skeletal model. In such a model, the body is divided into

rigid sections, called bones, that roughly correspond to the character’s skeleton. These

bones are arranged hierarchically, with a parent-child relationship forming a joint. The

orientation of each joint can be represented as a local rotation matrix, and each bone

can be represented as a rigid translation. Motion can be introduced to the system by

changing the matrices over time. In general, the joint transformation can be any combi-

nation of transformations, although many systems make the simplifying assumption that

all joints are purely rotational.

The character can be posed by specifying rotational values at the joints. Joint limits

derived from anatomical data are often enforced to prevent the skeleton from assuming

unrealistic positions, although these do nothing to limit self-intersection or balance con-

straints. Joints may be constrained to only allow movement along certain axes. Each of

these axes is referred to as a degree of freedom. A pose is fully specified by a complete

listing of all of its degrees of freedom. Often, some elements of the global position and

orientation of the root bone (usually the pelvis) is also included in the definition of the

pose.

The mathematics of the transformation are quite simple. Consider the joint and bone

hierarchy depicted in figure 2.1.1. Each bone has a translation matrix associated with

it, denoted Tx. Rotational joint transformations are named after their child joints, and

denoted Rx. The root transform, which can be any combination of rotations and trans-

lations, is Mroot. To express the position P of the far end of a bone in world coordinates,

it is only required to concatenate the transformations. The full transformation for the

tip of each bone can be expressed as


T1

T2

T3

T4

R2

R1

R3R4

P1

P2

P3

P4

M1

Figure 2.1: The transformation chain of a skeletal hierarchy.

P1 = T1R1Mr (2.1)

P2 = T2R2P1 (2.2)

P3 = T3R3P2 (2.3)

P4 = T4R4P2 (2.4)

These transformations can be used directly to position graphical representations of

the bones, or the transformation chain can be used to drive a linear-blend skinning scheme

(as will be discussed later). The application prototypes developed for this thesis use a

simple rigid body part graphical model, for simplicity.

This model contains many simplifications. Often many fewer bones are used than

exist in an actual skeleton. For example, the human spine has 26 vertebrae. The default

skeleton used by the Vicon 9 motion capture system has only 3 bones in its pelvis-to-

head chain. Joints are commonly simplified in terms of allowable axes of rotations. The

translational effects of stretched tendons and soft tissue is ignored when the translations

are excluded from the joint transform. The assumption that bones are rigid is also


suspect, as real bones exhibit surprising flexibility under load.

2.1.2 Driving a Mesh

Skeletons provide a fast and convenient method for representing the motion of an articu-

lated figure, but are not attractive when rendered. The eventual goal with most character

animation is to deform a surface model. The underlying motion representation discussed

in the previous section is often used to drive such a deformation. Given a complex enough

surface model and deformation method, the results can look quite good. With greater

artistic expectations, however, comes a requirement for more realistic motions. In [31]

Hodgins et al. present experimental results that suggest that people are more able to

spot differences in motions when they are expressed though a polygonal surface model,

rather than though a stick figure.

One of the simplest methods for deforming a mesh by a skeleton is to use linear blend

skinning, which maps each vertex to one or more of the figure’s joints with a set of real-

valued weights. This leads to the points being deformed by a linear combination of their

parent transforms [39]. This technique is conceptually simple, and can be implemented

in graphics hardware. As such, it is often used in games, or similar real-time applications.

While linear blend skinning provides a fast solution for on-line applications, it introduces

unsightly artifacts to the mesh. For this reason, more complicated models are often used

in off-line animation.

2.1.3 Other Pose Representations

Not all researchers use a skeletal animation system. Skeletal animation, even when

paired with a skinning algorithm, is a simplification that does not accurately reflect the

deformations of a flexing subject’s surface.

In [1], Alexa and Muller present a PCA-based vertex representation for time-varying

geometry. They perform PCA on an animation represented by a collection of keyframe


meshes with isomorphic vertex-edge topology. The use of PCA has several benefits, but

the motivating factor for using it in this case is dimensionality reduction. In addition, it

also facilitated a mesh-correspondence algorithm to transfer animations between similar

meshes.

Kovar et al use a skeletal animation system with a point-based pose distance metric

in an ongoing series of papers [45, 44, 43]. In order to compute a distance between

two poses, a low-resolution mesh is deformed to the current pose. The metric is then

based upon a squared sum of distances between corresponding mesh vertices. The root

transformation that cancels the difference between the two poses is found by a closed

form minimization.

In [46], Kulpa et al. describe a motion representation that is independent of character

morphology and which encodes the constraints in the motion itself. This allows of the

easy transfer of motion between different characters, and facilitates the enforcement of

spacetime constraints.

2.2 PCA

Principal Components Analysis is a statistical technique that is widely used for dimen-

sionality reduction [8]. The result of performing PCA on a given dataset is infect vector

space with the same dimensionality. Each axis in the space represents a principal com-

ponent vector. Any point in the space is thus a weighted combination of the principal

components. If the principal components are ordered according to the amount of variance

that they describe in the original dataset, the variances typically show an exponential

drop-off. It is this property that admits dimensionality reduction: a full data point can

be represented with a predictable degree of fidelity by using some smaller subset of its

PC coordinates.

The standard method for performing PCA on a set of n d-dimensional points is to


first determine the sample mean, and subtract it from the data set. Next, the covariance

matrix of the points is found. An eigenanalysis is then performed on the covariance ma-

trix, yielding d eigenvectors and eigenvalues. The eigenvectors, which are orthogonal by

virtue of the diagonal originating matrix, form the basis of the PCA space. d-dimensional

can be transformed into the PCA space by multiplying them by the basis matrix. If the

original data exhibits a low-dimensional linear structure (such as lying about an embed-

ded plane), further data that conforms to the same structure, when projected into the

space, can be represented using fewer than the full set of basis vectors with minimal loss

of fidelity. Full-dimensional points are reconstructed by multiplying the projected points

by the basis vectors, and re-adding the original sample mean.

An interesting twist on standard PCA is weighted PCA. Skocaj and Leonardis present

a framework for wPCA in [63]. Working within a vision context, they seek to construct a

PCA model of a video stream. They apply temporal and spatial weights to pixels in the

video, denoting their relevance to the model. For example, occlusions in the camera’s

field of view can be masked out, and periods of bad lighting or focus can be ignored.

2.3 Creating Motion

Once we have a way to represent poses, we can consider ways to generate them. Currently,

motion data is quite expensive to acquire, compared to other forms of multimedia. For

example, high quality images can be taken with consumer-grade cameras. Collecting

motion data requires either considerable technical and artistic expertise or specialized

hardware.

2.3.1 Keyframing

The most common way to generate motion data is to meticulously build it by hand. In

production studios, animators often use software packages like Alias’ Maya, or Discreet’s


3D Studio Max. While techniques such as inverse kinemetics and procedural animation

help to reduce the workload, most time is spent setting up key frames. Key framing is

a concept borrowed from traditional animation, in which a lead animator often draws

only the most important frames in a sequence. Other animators then proceed to draw

the “in between” frames. In computer animation, interpolation takes the place of the “in

betweeners”.

2.3.2 Rotoscoping

Rotoscoping is an animation technique that results in extremely life-like motion, because

it is in fact drawn from live motion. The desired motion is first recorded on film or

video, resulting in a series of frames. Drawings are then done over each frame, using the

captured images as a reference. The process of rotoscoping can be used as a time-saving

shortcut to producing traditional-looking animation, or as a means to creating stylized

animations. An example of a film that did the former is Disney’s Snow White, and an

example of film that did the latter is Linklater’s Waking Life.

2.3.3 Motion Capture

An alternative to keyframing is to use motion capture. Motion capture systems use

various techniques to digitize the movements of an actor. Where rotoscoping generally

only recovers the two-dimensional projection of the position of the actor’s body from the

perspective of the camera, motion capture recovers a fully three-dimensional representa-

tion of the actor’s pose. For this thesis work, we had access to a Vicon 9 motion capture

system. The Vicon 9 is vision based: the actor wears special reflective markers, which are

viewed by an array of cameras. Given enough cameras to avoid self-occlusion, the loca-

tions of the markers can be found via computer vision techniques. Software provided by

the manufacturer can be used to fit the reconstructed marker positions to an underlying

skeletal model, and estimate the joint angles. Motion capture is useful for creating large


volumes of realistic motion data quickly and easily, but it has several limitations. First,

the equipment involved is expensive and awkward to use. Secondly, the resultant motion

is limited to the realm of the possible. In order to get animation of superhuman feats,

post-processing with traditional animation tools is required. Similarly, motion capture

is hard to implement for non-human animal subjects, and impossible to implement for

imaginary subjects.

2.3.4 Simulation

As the computing power available to animators grows, simulation is becoming a more

feasible option for generating certain kinds of character motion. A good general intro-

duction to the concepts behind numerical simulation is the Siggraph 1997 course note

package prepared by Baraff and Witkin [4]. The notes start with a review of differntial

equations, and work their way up to rigid body dynamics and constrained dynamics, two

subjects crucial for physical character animation.

Most work in physical character simulation focuses on specific behaviours or aspects

of motion. A good example of this is the controller-based work of Yang et al. [71]

A swimming character is intimately tied to its environment through full-body contact

with a viscous medium, so simulation works well to add the subtle interactions that an

animator might miss. A further example is the work of Hodgins et al. that deals with

animating human athletics [32]. In this case, specific motions that depend upon balance

or ballistics were simulated with a high degree of verisimilitude. Work on composable

controllers by Faloutsos et al. provides a framework for switching between specialized

controllers during a simulation to allow a simulated agent a wider repetoire [20]. This idea

was further explored in [21]. A mixture of kinematic animation and physical simulation

was used by Shapiro et al. in [60], where they implemented a supervisory controller

similar to Faloutsos’ which switched between animation methods depending upon the

circumstances in the scene.


Procedural controllers can also be used to drive kinematic animation. In [66], Sun

and Metaxas present a layered controller that uses a novel interpolation technique to

synthesize walking motion from database examples. Here, the plausibility of the resultant

motion is maintained through heuristics and the sample-remix nature of the data, rather

than from a physically correct simulation. In [55], Neff and Fiume present another use

of kinematic controllers. In this case, they use a heuristic approximation of balance

(amongst other things) to increase the expressive power of an IK solver. Taking the

complimentary approach in [54], they used dynamics simulation to increase the expressive

range of PD controllers.

An interesting use of simulation is presented in [73]. In this work, Zordan and Hodgins

use motion capture data and IK techniques to drive very stiff controllers in a physical

simulation. When a contact occurs in the animation (such as a boxer getting punched),

the controllers are loosened to allow the dynamics to have a greater effect on the overall

motion. This technique allows for motion capture reuse, but maintains a degree of

interactivity.

2.3.5 Digital Puppetry

Techniques from puppetry have successfully been used to create nuanced motion perfor-

mances. Puppetry itself is an ancient art, but even the concept of remapping a person’s

movements to an exterior manipulator is not new. For example, Heinlein provided the

intellectual groundwork for telerobotics in his 1940 novella Waldo. It is a small step to

move from remapping one’s degrees of freedom to a robotic manipulator to remapping

them to a virtual character.

Remapping motions from one character to another is a common problem in computer

graphics, and a fundamental issue in virtual puppetry. One of the first papers to attempt

to solve this problem was Gleicher’s work, [27], which transfers motion between characters

with the same skeletal structure, but different limb lengths. Key features of the input


motion, such as foot falls or interactions with external objects, are specified as constraints,

and the new motion is found via a non-linear optimization, using the input motion as a

starting point. Shin et al. present a framework for filtering real-time motion capture input

and remappping it to a virtual character [61]. This remapping is guided by constraints

which are deemed as having “dynamic importance”, which depends upon the context of

the motion itself. Shin’s system operates in real time, so the types of constraints that

can be specified are more limited than those in Gleicher’s off-line system, which is free

to optimize over all of spacetime.

Many characters that an animator/puppeter might want to control will have non-

standard body configurations. In [19], Dontcheva et al present a system that allows the

animator to interactively animate characters by manipulating motion-captured widgets

made from Tinker Toys. Mappings between the widgets and the character’s degrees of

freedom are built on the fly by the animator imitating the character’s movements. Since

the animator can only operate a few degrees of freedom at one time, complex animations

are built in multiple passes. This provides an intuitive, play-like interface that allows for

impressive results in a short amount of time.

In [48], Laszlo et al. take a different approach to the problem of mapping input DOFs

to performance DOFs. In this work, several different control schemes are presented that

relate mouse movements and keypresses to simulated motor controllers. The figures being

animated are physically simulated in real-time. The simulation provides a subtlety to the

motion beyond the raw data provided by an input device, and with some training, inter-

esting performances are possible. This process presents a highly interactive environment

that is closer to performance art than traditional animation.


2.4 Motion Processing

Most works take some variation of the skeletal representation developed in the previous

sections for granted. There is much more diversity of opinion when it comes to repre-

senting motion. We present an overview here by dividing the various approaches into

two camps: those that treat motion as a signal, and those that treat it as a progression

through a series of states.

2.4.1 Signal Based Techniques

Bruderlin and Williams’ “Motion Signal Processing” [12] provides a good introduction

to the signal theoretic approach to motion. In this paper, they introduce the concept

of multiresolution filtering for motion data. By applying band-pass filters to the indi-

vidual degrees of freedom of recorded motions, they are able to change the character of

the motions. They also present a multi-target interpolation technique that can be used

in conjunction with the dynamic time warp algorithm (see below) to produce blended

animations. They introduce motion displacement mapping, wherein a motion signals are

locally altered to resemble example motions. The paper uses a straightforward Euler

angle parameterization, which limits its applicability to complex joints such as the shoul-

der. The interpolation techniques that they present, in particular, would be fraught with

rotation order artifacts that would not be present if they were to have used quaternions.

Li et al. present a novel signal-based motion editing technique in [49]. The central

conceit of this work is that the structure of a motion comes from its mid and low fre-

quency components, while its character (or “texture” in the paper) is expressed in higher

frequencies. By decomposing motion signals into Laplacian triangles, it is possible to

transfer “texture” from one example to another through a pattern-matching algorithm.

With this technique, motions that have been coarsely keyframed can be automatically

updated with detail from a previously completed (or motion captured) example.


Gleicher offers another perspective on the signal view of motions in [26]. In this work,

motions are modified through a displacement mapping procedure. The displacement map

is found through a constrained optimization, where the constraints are defined by the

animator. For example, the animator might specify that the figure’s hand must follow

a certain trajectory. Using automatic differentiation, the Jacobian of world-space con-

straint parameters are expressed with respect to the motion’s native parameterization

(Euler angle joint positions, in the paper). The optimization minimizes the weighted

magnitude of the displacement vector. The spacetime framework is used again in Gle-

icher’s work on retargetting [27].

In “Verbs and Adverbs” [58], Rose et al. describe a technique for creating a space that

supports parameterized interpolation and extrapolation. Through manual segmentation

and mark-up, example motions are clustered into ‘verbs’ and described with subjective

‘adverbs’. A good example of this taxonomy would be two clips labeled as walking, with

a real-valued ‘jauntyness’ parameter. Given several parameters, a coordinate system can

be constructed to hold all examples of a particular verb. Interpolation values can then

be found using radial basis functions. New motions are synthesized by applying these

interpolation values directly to the motions’ individual Euler-angle valued degrees of

freedom. The paper also describes how to create ‘verb graphs’, which define transitions

between different verbs. A time warping algorithm, along with spacetime constraints are

used to construct the transitions between the verbs.

2.4.2 State Based Techniques

An alternative to looking at motion as a collection of signals is to view it as a discrete

collection of poses. Each pose is represented by a particular configuration of the feature

vector (such as joint angles or vertex positions).

A series of papers by Kovar et al. play upon this idea [45, 44, 43]. These papers all

build distance tables between pairs of sampled motions. The table is then operated upon


to get various effects. The important distinction here is that the pose is the fundamental

entity. In Kovar et al.’s “Motion Graphs” [45], new motions are synthesized by finding

suitable sequences of poses from an example motion, much in the same way that video

frames are processed in [59].

Works that perform PCA upon motion data usually take a state-based perspective.

In [25], entire cycles walking sequences are collected into vectors, and PCA is performed

at a very high level. Points in the resulting space represent cycles. By projecting several

parameterized examples into the space, it is possible to build axes. Points can be then

be sampled along the axes in order to interpolate or extrapolate the original parameter-

ization.

Brand and Hertzmann’s Style Machines [11] used PCA as well, but takes a more

pose-centric view. The main thrust of this paper was an application of Hidden Markov

Models to pose data. PCA was used to reduce the dimensionality of the dataset to make

the HMM training feasible. Given examples of similar movements performed in different

styles, related HMMs are trained, which indicate which portions of the two motions are

similar. The pairing of HMM states allows an animation to be synthesized which moves

between styles at will, but maintains a consistent choreographic structure.

A more recent work by Grochow et al. [29] fits a Scaled Gaussian Process Latent

Variable Model to pose data in order to construct a map of the likelihood of poses in a

given movement. This map can then be used as a part of the objective function of an

optimization process in order to perform inverse kinematics that conform to the ‘style’

of the constructing motion data. Again, the base unit of currency is the pose.

A novel use of PCA is found in the work of Barbic et al. on segmentation [5]. A space

is built from the raw quaternion representation of a motion. The inherent dimensionality

of the motion is calculated over time. The authors contend that a gross segmentation can

be made at the zero crossings of the derivative of the inherent dimensionality. They also

experiment with fitting a Gaussian Mixtures Model to the projected data, and segmenting


based upon the resulting clusters. While the segmentation algorithms presented in this

paper work well, the motion representation used is limited in that interpolation is not

possible in the constructed PCA space. This precludes it from being used for synthesis.

2.5 Motion Segmentation and Recognition

Motion capture data is most often captured in long takes. Individual motions, if they

are to be used in a production, or as a part of an interactive application, must then

be segmented out of the original sequence. This is a repetitive and boring task, so a

system for automatically segementing the data would be a boon. A related task is mo-

tion recognition. If motion data is to be used for real-time interaction, a method for

interpreting the motion as it occurs must be implemented. Since motion capture equip-

ment is not yet widespread, most of the relevant work in this field comes from the vision

field. While a video of a person’s motions and motion capture data representing those

motions differ greatly in representation, they describe the same underlying phenomena.

In the research, motion capture is often viewed as a way to perform research on advanced

interaction techniques, working under the assumption that the same functionality will

become common vision techniques in the future.

The boundaries between one motion and other motions are often ambiguous. A

person may express more than one gesture at a time with different parts of their body - for

example juggling while walking. For this reason, it is somewhat easier to segment motions

that have a sense of structure, such as sports, dance, or martial arts. An example of a

system designed for dance is [38]. In this paper, Kahol et al. derive velocity, acceleration,

and positional data for various body segments from motion capture data, and aggregate

the results into an observation vector. They then trained a Bayesian classifier with

manual segmentations provided by several different human choreographers. The trained

system was able to correctly predict 93 percent of the gesture boundaries produced by


the five choreographers when presented with novel motion data. This system emphasises

the subjective nature of segmentation. Even within a structured context, the various

human observers produced different results. That the system was able to predict each

observer’s style is most impressive.

Bobick and Wilson present a technique for recognizing motions in [10]. In this work,

motions are represented as trajectories, regardless of their source, such as from a 2D

mouse, a motion cpatured point, or even the PCA projection of an image sequence. The

method hinges upon the notion of “state”: a gesture is defined as an ordered progression

through several regions of configuration space. A prototype trajectory is built from one

or more examples, and states are found via a clustering algorithm. When the trained

system is presented with a novel trajectory, a dynamic programming algorithm is used to

estimate its support from each gesture prototype. Overall, this technqiue is conceptually

similar to Hidden Markov Models. The authors state that the most important distinction

is that this method can build a prototype from a single example motion, whereas Hidden

Markov models require a larger training set.

Hidden Markov models [56] provide a method for modelling and predicting the be-

haviour of a time varying system. The model assumes that the system can be approxi-

mated by a stochastic state machine. The internal (“hidden”) states and state-to-state

transition probabilities of the system are determined solely through the observation of its

output. In motion terms, the states found by such a system would be gestures or actions

that a figure can exhibit, and the output would be an observation of the figure in some

form (such as joint angles, or video).

Many papers have been written applying HMMs to various gesture recognition tasks.

In [70], Wilson and Bobick describe a vision-based system that trains an HMM online,

effectively learning new gestures on the fly. Starner and Pentland use HMMs to track

American sign language through video [64]. In vision-based techniques, the feature vector

used is of utmost importance. Campbell et al. explore the effectiveness of various feature


vectors for use recognising Tai Chi movements using an HMM in [13]. Beck created a

vision-based system for Tai Chi training using HMMs in [6].

An interesting image-processing based motion recognition framework is presented by

Davis and Bobick in [18]. This system makes extensive use of Motion History Images

(MHIs). MHIs are produced by extracting binary segmentations of the foreground figures

in a sequence of video frames. These binary images are then superimposed over each

other, with an intensity keyed to their frame index. The resulting images are quite

distinctive (and interesting artistically), and are amenable to standard image recognition

techniques. MHIs were used extensively in [23].

2.6 High Dimensional Data Search Techniques

In chapter 6, we develop a method to search a long motion for segments of high similarity

to a short query segment. Searching within a set of loosely-ordered, high-dimensional

data points is a difficult task, and an area of active research. In this section we will first

discuss some general strategies for high dimensional searching, then present some relevant

results in the specific field of motion searching. We finish by presenting details about the

dynamic timewarping algorithm - a technique that we use in our search algorithm.

One way to search high-dimensional data, such as pictures, sound, or motion capture

data, is through markup. ’Markup’ refers to textual annotations that are added to

data. Search is then done by proxy on the text. The MPEG-7 standard [50] describes

a framework for multimedia mark-up. The standard covers multiple media types, and

can be extended to cover others. Features in the media stream are assigned descriptors

according to a schema based upon the media type. These desctiptors can then be queried

to navigate the media stream.

One thing that the MPEG-7 standard does not address is how to apply the markup

in the first place. Certain features can be extracted from a stream automatically, but


higher level features that require an understanding of the stream, or outside knowledge,

must be found manually. In chapter 7 we suggest (as future work) a method to semi-

automatically apply subjective markup to motion capture data using a search algorithm.

More sophisticated results may be realized with modern machine learning techniques.

If meta-data is not used in a search, it can be difficult to phrase the query. One

strategy that has been successful for several types of data is query-by-example. In a QBE

system, the user provides a sample data point, and the system returns other points that

it deems similar. A survey of such systems is provided in [72].

The key component of a similarity-based search algorithm is a well-defined distance

measure between the data points that are vistited. Unfortunately, efficient and robust

distance measures are hard to design for many types of media. Salesin and Finkelstein

present a wavelet-based search method for static images in [36]. Their method transforms

an entire image into a robust and much more compact signature. The signatures that

they define are robust enough that the user can specify a very rough version of the image

as a search key. This lends itself to an intuitive sketching interface.

Using a signature for searching works well for discrete entities, like whole images, but

is not applicable to motion data, where potential matches take the form of subintervals

within a much larger time-series. Fortunately, there are several techniques for finding

similarities in sequences. Hidden Markov Models, which were introduced above in our

discussion on segmentation, are a good candidate. In [68], Valivelli et al. use HMMs to

implement an example-based search for audio data. A model of “uninteresting” sound

is built from a large library of noises that do not match the query clip. This model is

then used in conjunction with a model of the query to find matching regions in the input

stream.

Motion data is still quite scarce, so few authors have addressed the task of searching

through it. As motion data becomes more prevalent, however, research in this direction

is starting to appear. In [44], Kovar and Gleicher create an exhaustive table of the


inter-pose difference between two motion sequences of arbitrary length. With some post-

processing, this table can be used to quickly find matches for segments from one motion

in the other. While useful for certain applications, such as the parametric extraction task

which is the major focus of their paper, the long pre-processing time precludes it from

use with novel or real-time queries.

Dynamic time warping is a technique that is traditionally associated with speech

recognition, but is often applied to other signals as well. DTW defines a non-linear

correspondence between two signals, effectively stretching and compressing one of them

to match the other. The algorithm is computationally expensive, and is solved using

dynamic programming [7]. Conceptually, the two signals are arranged along the axes of

a two dimensional matrix. This matrix is filled with the pair-wise sample distances of the

two signals. Starting from any index in the matrix, an optimal alignment can be found

by accumulating the minimum distance forward and backward to the boundaries of the

matrix. In most situations, the high computational cost of the algorithm stems from the

filling of the distance table.

Bruderlin and Williams applied DTW to animation parameters in [12], which we

have previously mentioned. Kovar and Gleicher have used it to align motion clips before

interpolation [43], and there is active research within the data mining community to

improve upon the basic algorithm [16, 41].

Not all motion matching systems use DTW to align signals. In [14], Cardle et al.

present a system for motion searches based upon the Longest Common Subsequence-

based multidimensional trajectory comparison measure proposed by Gunopulos et al.

[69]. Keogh et al. use uniform scaling to match signals globally in [40], avoiding the

degenerate over-fit warps to which DTW is prone.


2.7 Summary

In this chapter we have presented the background materials that define the context of

our work. We began by discussing various methods of representing poses, and describing

the dominant skeletal hierarchy method upon which our work is based in detail. We then

explored various methods of creating motion data from which poses can be extracted.

Next, we considered two perspectives on the problem of processing motion data once it

has been created. We finished the chapter with brief overviews of motion segmentation,

recognition, and search techniques. This chapter presented general background materials

and papers related to the thesis. More specific references for specific techniques are

provided in context through the remainder of the document. We shall cite other work as

we develop the technical material later, particularly the background work related to the

creation of Motion Curves in Chapter 3.

Chapter 3

Motion Representation

In this chapter we introduce motion curve space, a representational framework for pose

and motion data. We begin by highlighting the problems with other pose representations

that instructed the development of motion curve space. Next, we explain the steps that

must be taken to construct a motion curve space from example data. We then briefly

describe the features of the space, foreshadowing the detailed descriptions in the following

chapters. A visualization method for the motion curve representation, which is used in

almost all of the prototype applications that were developed for this thesis, is described

next. Finally, we present a method for building statistical models of poses in motion

curve space in order to recognize those poses within novel motion clips.

3.1 The Trouble with Motion Data

In its raw form, motion data is not easy to work with. Much of the difficulty stems from

the lack of an inherent distance function between poses. Researchers have used many

different approaches in their own motion work, such as the deformed point-cloud method

described be Kovar et al. [45], or the weighted sum of quaternion distances proposed

by Johnson [37]. We present a weighted-PCA based representation for poses that has a

Euclidean distance metric. The simple distance metric allows for the direct application of

26

Chapter 3. Motion Representation 27

standard data processing techniques. Being PCA-based, our representation also benefits

from having a coarse-to-fine interpretation, which may allow for less accurate, but quicker

distance calculations.

3.2 Motion Curve Space

A motion curve space is constructed using a motion clip. The choice of clip is very

important, because the joint angle correlations that it contains are reflected in the distri-

bution of the axes in the resulting space. The clip should be long explore the full range

of motion for each joint in the figure. If a joint is not fully exercised in the example clip,

certain valid poses may fall outside of the span of the space. Since it is impossible to

represent out-of-span poses accurately, the use of motion curve space becomes lossy. We

will analyze the error inherent in our representation after we describe how to construct

a space.

When working with motion capture data, we usually use a space created from the

range of motion test data that was used to calibrate the capture system. This ensures

that the maximal amount of variance is introduced during the construction of the space.

Sometimes such clips are not available. For example, in Chapter 6, we use synthetic

motion data created using dynamic simulation and controllers. The controllers were very

simple, and incapable of fully exploring the space of possible poses. In this case, we used

the data clip that we were operating on to create the space. The scarce input data led

to a space with a small span, and thus less expressive power. It was acceptable for the

purposes of the search algorithm that we were testing, however, because we were not

trying to create and represent new poses.


3.2.1 Constructing the space

Creating a motion curve space is a two-step process. The first step is to linearize the

quaternions of the example clip, and put the data into a matrix form. The second step

is to apply the weighted PCA algorithm, and orthogonalize the resulting vectors. Each

step will now be explained in detail.

Linearizing a unit quaternion brings it from a four-element imaginary vector to a

three element real vector. The advantage of doing so is that one does not have to worry

about keeping the vector normalized: all possible vectors in R3 correspond to a valid

rotation. Grassia explains the procedure in [28], and we outline it here for completeness.

A unit quaternion can be expressed in Euler form as

QT = en θ2 , (3.1)

where n is the axis of rotation, and θ is the angle of rotation. The quaternion is linearized

by taking its logarithm in this form:

log QT = log en θ2

= n θ2.

Computationally, the mapping from vector ~v to quaternion [ w x y z ] is implemented

as:

θ = |~v|

w = cos θ2

[xyz] = ~vsin θ

2

θ.


And the reverse operation is:

m = 2 arccos w|[xyz]|

~v = m [xyz] .

Derivations for these operations can be found in [37] and [28].

The linearization of a quaternion is performed with respect to some reference orienta-

tion. This is done by ‘rotating out’ the reference via quaternion multiplication before the

log is taken. The choice of reference orientation is very important, because the accuracy of

an interpolation between two linearized orientations is reduced with their distance from

the reference. The orientations that show up in hierarchical skeletal models are often

based upon actual skeletal joints. Real joints usually have tightly constrained bounds,

and in most natural cases, the movement will tend to fall into an even tighter comfort-

able range. We exploit these features and use each joint’s sample mean orientation as its

reference. The procedure for finding an estimate for the mean of a set of quaternions is

discussed in detail in section 5.2. Each linearized quaternion in a pose is concatenated

to create a vector of length 3DOF. These vectors will be the observations in the WPCA

algorithm.

We can use the pose vectors created during the linearization step to construct a PCA

space. Such a space, however, will not take into account the hierarchical nature of the

pose data. Perceptually speaking, a few degrees of change in the angle of a shoulder

changes the shape of a pose much more than a similar change in a toe. In fact, ‘noisy

toes’ can threaten to dominate the PCA space, and lead to an inefficient distribution of

the motion’s degrees of freedom over the principal components. This in turn increases

the number of dimensions that must be used to produce acceptable looking motion.

In order to prevent this, we use weighted PCA. Skocaj and Leonardis present a wPCA

formulation for vision applications, wherein weights can be applied to both subsections


of individual frames, and to entire frames [63]. We use only the former, and apply a

real-valued weight to each joint. The specific weights used can be manipulated to change

the properties of the resulting space, as we will show later. In the general case, we

use weights that are derived from an approximation of the relative amount of body mass

that is influenced by the movement of each joint. Pseudocode for the wPCA construction

algorithm is given in algorithm 1.

Algorithm 1 Creating the wPCA space

Ensure: X ← linearized ROM data

Find the mean pose

for all samples in ROM do

for all DOF do

Rotate out the mean quaternion

Linearize the result

Accumulate in matrix X

end for

end for

Ensure: X = wU × A

U ← random values

its← 0

reconError ← infinity

while (its < maxIts)∧

(reconError < ε) do

E Step: QR Solve for projection A

M Step: LU Solve for space vectors U

end while

return the orthogonalized columns of U as the PCs

We construct the motion curve space off-line using an offline application. The princi-


pal components, along with their corresponding eigenvalues, and joint means, are saved to

a file. Any number of these files, each built with different weightings or reference datasets,

can be used during an session with the interactive programs that we will describe over

the course of this document.

3.2.2 Projections and Unprojections

Motions can be expressed within a space by projecting them into it. They can be taken

out of the space (after modification, for example) through the process of unprojection.

Before projection, a pose must be linearized. This is done in the same way as it was

for the construction of the space, except the stored joint means are used. The projection

itself is then a matter of a simple vector-matrix multiplication. The projected coordinates

p can be found by multiplying the pose vector v by a matrix B, which has for rows the

space’s bases:

~p = ~vB. (3.2)

Unprojection is equally simple. First, the opposite multiplication is made:

~v = ~pB−1. (3.3)

By construction, matrix B is orthogonal, so B−1 is simply BT . Given v, a quaternion

representation of the pose can be found by taking the log map (equation 3.2).

The matrix multiplications in these operations are readily optimized. Multiple poses

can be concatenated into matrices for batch processing in both directions. The log and

exponential mappings are more expensive, since they involve the evaluation of square

roots and trigonometric functions.

A single pose projects to a point in high-dimensional motion curve space. A sampled

motion that is made up of multiple sequential poses projects into a time-ordered series


of points. As we shall see in later sections, this representation lends itself to a geometric

interpretation. The fact that the lower dimensions of the projection can be visualized

geometrically reinforces the metaphor.

3.3 Space Characteristics

Various operations that are complex to perform with the original quaternion-based mo-

tion representation are greatly simplified using the wPCA representation. In this section,

we discuss several of the characteristics of motion curve space that make it useful for

working with motion.

3.3.1 Pose Distance Metric

The most significant feature of motion curve space is that it has an implicitly defined

distance metric. Since it is by construction a real vector space, the L2 norm can be used

as a metric. In practice, however, we usually subject the space to an affine scaling before

applying the norm, to take into account the relative amount of variance captured in each

axis. If ~v = [v1, v2, ..., vn] is a vector containing the eigenvalues from the orthogonalization

step of the space construction, the distance metric for comparing poses p and q an n

dimensions space is written as:

√√√√ n∑i=0

v2i (pi − qi)2 (3.4)

It is sometimes advantageous to truncate the sum when evaluating the distance. This

estimates the high-dimensional deviation between poses by a lower-dimensional approxi-

mation. The relative distances of sets of points is not consistent under such a projection.

The frequency of such a projection error is reduced by the fact that in mnay cases

vi > vi+1 exponentially. Still, projection errors can creep up in certain situations, such

as the reduced-dimension Approximate Nearest Neighbour search described in chapter 6.


The weighting scheme used during the wPCA phase of space construction is reflected

in the distance metric. Movement in joints that were weighted heavily is represented

in the lower dimensions of the space, and thus have much higher ~v coefficients. An

animator thus has some control over the nature of the distance metric. By strategically

weighting different joints, it is possible to build spaces that have distance metrics suited

to specific tasks. For example, if an animator is working with walk cycles, s/he might

decide to weight the joints of the legs higher than those of the upper body. This will

cause two poses that have similar leg orientations and dissimilar arm orientations, to be

considered as closer together than two poses with similar arm orientations and dissimilar

leg orientations.

3.3.2 Dimensionality Reduction

One of the primary uses of PCA is to reduce the dimensionality of a dataset. By combin-

ing correlated axes, PCA allows for a data point to be represented by fewer coordinates

than in its natural form, with some loss in fidelity. For the size of data that we work

with (usually 57 degrees of freedom), we have found that this is not needed to attain

interactive manipulation rates with the techniques that we have developed. Since it does

not cost much, it is usually best to use the full set of bases for reconstruction. Artifacts

typically become noticeable on the 57 DOF dataset as soon as anything more than the

spurious DOFs have been removed. The essential character of the motion is retained

much longer, with most motions being recognizable with as few as 3-5 DOF, but the

fidelity is not acceptable for most applications. We do use dimensionality reduction,

combined with different weighting schemes, to target the pose distance metric and direct

the search algorithm developed in chapter 6. PCA also guarantees that the bases of the

motion curve space that we produce are orthogonal.

Given our representation, some dimensionality reduction is natural, however. A phys-

ical knee joint has only one degree of freedom, barring the bending that we are already


abstracting away in our model. Synthetic knee joints, such as from a physical simulation,

are even more likely to have information for only one degree of freedom. By representing

every orientation in our skeleton using a quaternion, we inflate the number of degrees of

freedom for the sake of consistency. Luckily, the wPCA procedure finds all of these spu-

rious degrees of freedom, and relegates them to the lowest-value principal components,

where they can be safely ignored.

3.3.3 Visualization

Motion, being time dependent, is challenging to visualize. One of the most compelling

features of motion curve space is that it lends itself to a natural visualization, which

presents the entire motion as a static entity outside of time.

We visualize motion curve space in three dimensions by displaying the lowest three

dimensions of the space. Poses can be rendered as points in the space. Sequential poses

from a motion can be joined using line segments (or even higher-order polynomials) to

reinforce the sense of continuity. We navigate the space using a mouse-dragging interface

similar to the one used in Maya. The camera is locked in a spherical coordinate system

built around a focal point. Left dragging the mouse orbits the view position about the

focal point (which is rendered as a small coordinate axis). Right dragging moves the

focal point and view-local ground plane, and middle-dragging moves the focal point on

the view plane. Scrolling the mouse wheel adjusts the camera’s distance to the focal

point. Figure 3.1 shows an example motion projection.

The appearance of a motion visualized using this system depends upon the content

of the principal components. The same motion, when viewed under projection into two

separate spaces, can appear drastically different. The features of the motion that are

reflected in the visualization are determined by which joints are controlled by the lowest

principal components - something that the animator can control indirectly through the

choice of weighting schemes during space construction. This is a beneficial feature - the


Figure 3.1: An example projection

visualization simplifies the data, while giving the animator the choice of what types of

things that s/he wants to see. In figure 3.2, we show several steps of a walking motion

projected into two spaces. The left hand space uses our standard weighting scheme, while

the other uses a scheme that is weighted heavily toward the leg joints. The phase structure

of the walk cycle is visible in both examples, because walking is a highly coordinated full-

body motion. For more localized motions, such as punching or tapping, the animator

may need to try using specialized weighting schemes to discover the motions’ structure.

3.3.4 Representational Error

As mentioned earlier, the projection of a pose into motion curve space is guaranteed to

be reversible if the pose was part of the dataset used to create the space, and all of the

space’s dimensions are used. In other cases, some error may be introduced. The amount

of error depends upon the rank of the projection matrix. Degenerate spaces, made from

motion clips that do not exercise every joint in the skeleton, cannot be made to represent

those joints that were not used. When using motion capture data, the rank will almost


Figure 3.2: Two steps of a walking motion projected into two different spaces

always be fully expressed, but this can be a problem when working with synthetic data.

In order to determine the effect of using different datasets for space construction,

we built several spaces using clips with different characteristics (the default weighting

scheme was used in each case):

• Full Range of Motion Test. This clip is a recording of the trial used to calibrate

the motion capture array. The actor starts in the T-pose, and then proceeds to

exercise each major joint in isolation. He finishes with some walking and stretches.

The total length of the clip is about 128 seconds.

• Truncated Range of Motion Test. This clip is 20 seconds, taken from the

middle of the range of motion test.

• T-pose. This clip is 3 seconds of the actor standing in the T-pose.

• Assorted Moves. This clip is approximately 70 seconds of the actor performing

various Aikido movements.

The reconstruction error was tested subjectively using a small application that allows


the user to view an animation alongside its reconstruction. The number of bases used

for the reconstruction is user-specified with a slider. Unsurprisingly, the full range of

motion test clip resulted in the space with the best properties. Motion reconstructed

with as little as 25 bases (out of 56) passed visual inspection, and no glaring artifacts

were present at any level of reconstruction. The T-pose and truncated ROM trials lead to

similar spaces - reconstruction with the full range of bases (minus the redundant DOFs)

was perfect, but the reconstructions did not degrade gracefully with reduced numbers of

bases. Using the assorted clip and a reduced number of bases caused a reconstruction

artifact resulting in contorted poses, but using the full set of bases fixed the problem.

The apparent robustness of the spaces (using the full number of bases) likely stems

from the random initialization of the base matrix during the wPCA procedure. Any full-

rank base matrix will produce a perfect reconstruction. The fact that there are several

redundant DOFs in our skeleton definition (since we are using quaternions to specify

all joints) reduces the rank required for a perfect reconstruction. In order to get good

reconstruction behaviour when using less than a full set of bases, the user should use

clips that exhibit a large range of motion when constructing a wPCA space. As we shall

see in chapter 4. the mean pose of the constructing clip will also affect the quality of

joint interpolations. In order to reduce artifacts, each joints’ mean should be as close as

possible to the interpolant joint orientations. Thus, the constructing clip should depict

natural motion, preferably reflecting the same range as the motion as the target motions

that will make use of the resulting space. A standard motion capture range of motion

test provides a good general case.

3.4 Pose Detection in Motion Curve Space

In this section we present a method for robustly detecting when an hierarchical skeleton

assumes previously modeled poses. Interestingly, this application is what lead to the


development of the Motion Curves representation. The original context for the task

was segmenting real-time motion captured movements for use in a sonification-based

physical training system. The representational issues involved with the segmenting tasks

proved more interesting than the training system, however. As the expressive power

of the Motion Curves representation became apparent (as will be seen in the following

chapters), this work took its current form.

We want to be able to determine when the subject driving the motion data has

entered a specific static pose. This pose may be the canonical T-Pose, a certain martial

arts stance, or any other static configuration. A naive implementation would be to use

the pose distance metric to compare the incoming poses to an example of the target pose,

and threshold the results. The result of such a scheme is shown figure 3.3.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Figure 3.3: Comparing the data to a single target pose. The horizontal axis is time, and

the vertical axis is similarity.

In this example, the motion clip depicts a performing random movements interspersed

with returns to the T-Pose. The naive approach does a remarkably good job of indicating

when the subject is in the T-pose - these regions in time are indicated by the plateaus

in the graph.


0 1000 2000 3000 4000 5000 6000 7000 8000 9000-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Figure 3.4: Comparing the data to an average target pose

We can improve upon these results by taking the average of many examples of the

T-pose as the target, rather than one particular instance of the T-pose. The results of

this test are shown in figure 3.4. Note that the results are almost identical to the naive

case. This is because the particular pose chosen for that test was near the average point.

Measurement error with the motion capture system prevents exact skeletal configura-

tions from being repeated, but on a larger scale, it is very difficult for an actor to exactly

repeat a pose. A difference of millimeters in certain parts of the body will not usually

be detectable to the human observer, especially if the two poses are presented with an

intervening movement. Thus, instead of specifying one particular instance of a pose to

be the Platonic ideal, we will describe a distribution of valid poses. This will lead to a

much more robust comparison that allows dome deviation from the “ideal” pose.

A special motion capture trial was taken wherein the actor tried to explore all possible

variations of the ready pose. This data was projected into a trucated Motion Curves

space, yielding a set of three-dimensional coordinates. A Mixture of Gaussians model

was fit to the data using readily available software [15]. The result of the clustering is

shown in figure 3.5.


-2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

X1

X2

X1

X2

-2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

Figure 3.5: Two dimensional projection of the MoG model

Given a multidimensional MoG model, the log likelihood that a novel point belongs

to a particular Gaussian distribution can be evaluated by the following equation:

P (~x|~µ, Σ) =1

(2π)d2 |Σ| 12

e(− 12(~x−~µ)T Σ−1(~x−~µ))

ln(P (~x|~µ, σ)) = ln

(1

(2π)d2 |Σ| 12

e(− 12(~x−~µ)T Σ−1(~x−~µ))

)= −(~x− ~µ)T Σ−1(~x− ~µ)(

d

2ln(2π) +

1

2ln|Σ|).

Where ~µ and Σ are the Gaussian’s mean vector and covariance matrix, d is the

dimensionality of the MoG model, and ~x is the projected pose point.

A MoG model generally consists of more than one Gaussian distribution, along with

associated weights. The likelihood of ~x is calculated for each distribution, and the overall

likelihood is taken as the weighted sum. This formulation can be used within the same

framework as in our previous pose detection test. The result of such an experiment is

shown in figure 3.6. These results are very similar, because there was not much variance


in the T-pose trial. This makes sense: the T-pose is used as a reference in animation

because it is distictive and easy to assume. The standard weighting scheme that we use

heavily favours the shoulder and hip joints, and the space used for the MoG clustering

model was truncated at three dimensions, so only a very gross description of the pose

was retained. This is by design: we want a system that is robust to inconsequential

differences. Note that this technique’s sensitivity can be targeted, the importance of

various features of the pose are defined by the weighting scheme. If we wanted to detect

a pose entirely by the position of the right arm, this could be accomplished via a carful

tuning of the weights and the truncation point.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000-2000

-1800

-1600

-1400

-1200

-1000

-800

-600

-400

-200

0

Figure 3.6: MoG model evaluation of motion trial (using a different dataset)

3.5 Summary

In this chapter we introduced motion curve space. We began by describing the steps

involved in constructing a motion curve space, and showed how motions can be put into

the space and taken out of it. Next, we explored the features of the space, establishing a

distance metric, discussing dimensionality reduction, presenting a method for visualizing

projected motions, and explaining how error can creep into the representation. Finally,


we discussed how the work came about, and presented a method for detecting poses in

motion curve space.

Chapter 4

Interpolation

Interpolation is a very important tool when working with most graphics entities, and

motion data is no exception. By blending between poses, it is possible to combine dis-

parate clips seamlessly, to portray a continuous range of some underlying characteristic,

or even to do inverse kinematics. In short, interpolation allows the animator to greatly

amplify the expressive power of his dataset very quickly. In this chapter we investigate

several techniques for interpolating across and between motion clips. These techniques

exploit the fact that every point in Motion Curves space fully specifies a valid pose. We

also present a case study, where we implement motion graphs by Kovar et al.[45]

4.1 Two-Pose Interpolation

As mentioned in chapter 3, interpolating multidimensional angular values is not straight-

forward. Directly interpolating angular values in an Euler representation often gives

unsatisfactory results. Interpolating between two angles is the same task as finding the

rotational path that one of the orientations must travel to match the other. The ro-

tational movement in the Eulerian case does not take the most direct route. Different

rotation orders will give different results.

This problem can be solved by using Quaternions. The spherical linear interpolation

43

Chapter 4. Interpolation 44

(SLERP) operation finds the shortest path for a rotation by carving out a geodesic in

SO(3) [28]. Unfortunately, SLERP is a relatively expensive operation. SLERP also

suffers from the fact that it does not generalize easily to more than two interpolants, as

we shall see in the next section.

As shown in [37], the formula for SLERPing two quaternions Q1 and Q2 is

slerp(Q1, Q2, t) = Q1(Q∗1Q2)

t (4.1)

Where the operator ∗ denotes taking the quaternion conjugate, and t varies from 0 to 1.

The usual method to interpolate between two poses is to apply SLERP at every

joint. We present an alternate method for pose interpolation that takes advantage of the

motion curve representation. Motion curve space is Euclidean1, and every point within

it corresponds to a fully specified pose. When projected into such a space, smooth body

motions look like continuous, sampled curves. Intuitively, then, it would seem that simple

Euclidean linear interpolation would provide a reasonable approximation of SLERP-based

pose interpolation. We call this interpolation method MC-LERP, for brevity.

MC-LERP performs an interpolation that is very similar to the log-quaternion linear

interpolation described by Grassia in [28]. If each adjacent three-vector within a mo-

tion curve space’s principal components can be interpreted as a linearized quaternion,

and only two interpolants are used, the methods are equivalent. Grassia points out that

while this interpolation does not guarantee travel about the SO(3) geodesic like SLERP,

a reasonable approximation can be made through careful selection of the reference ori-

entation when linearizing the quaternion. The mapping that gives this approximation

is the one that minimizes the Euclidean distance between the interpolants. Obviously,

this criterion is not observed by MC-LERP: the reference orientation for the linearization

of all of the quaternions for a given joint is fixed during the construction of the wPCA

space. Recall, however, that the orientation used is the sample mean from the construct-

1A n-dimensional Euclidean space is a space constructed such that the distance between any two

points ~p1 and ~p2 is√∑n

i=0 (p1[i]− p2[i])2


ing clip. Most joints in the human body have a relatively small range of motion, and

many have only one or two degrees of freedom. If the constructing clip is truly indicative

of the motions used in an interpolation, the reference orientation should be close to the

minimizer. We posit that MC-LERP should is good enough for use in most interactive

situations where an animator would be using our system, and that the speed increase and

the flexibility of using more than two interpolants outweighs the minimal visual artifacts

that it introduces.

Experimentation provides empiric data to support our approximation. When applied

to various poses, the results of both interpolation methods look very similar, and both

look plausible.

The case for MC-LERP can be made stronger by investigating its behaviour when

applied to whole motions. Figure 4.1 shows a screenshot of a prototype application for

comparing motion interpolations. The user begins by loading two motion clips, and

arranging them on the timeline. Once the clips are in position, the parameters of the

interpolation can be set, and the actual operation performed. The results are shown both

applied to a figure, and projected into three dimensions. The user may choose to use

either SLERP or MC-LERP.

When interpolating between two motions, there are three cases to consider: when

the motions are separated in time, when they partially overlap, and when one motion is

contained within the other. We use the same interpolating weights for both SLERP and

MC-LERP, and define them for each case as follows.

• Separate: The lead-in and lead-out times define a linear ramp centered about the

mid-point between the end of the first clip and the start of the second clip. If the

clips are far enough apart in time that the ramp would reach its maximal value

before the start or end of the clip, it is stretched to fit. The ramp is not modified

if it is too wide for the gap between the clips.


Figure 4.1: Screen Shot from the Interpolation Explorer Prototype

• Overlapping: The ramp is constructed in the same way as the previous case.

• Containing: The lead in and lead out times define the widths of two ramps, one

centered on each extremity of the contained clip, that bring the value away from

and back to the containing clip. The value of the interpolating value between the

ramps is user-specified.

We compare the projections of LERP and SLERP interpolations between a pair of

overlapping motions in figure 4.2. The motions for MC-LERP and SLERP, when applied

to a figure and played back, look very similar. The results are similar to those from the

single pose case. Significant artifacts can only be detected when the target poses are very

different, and far from the mean pose.

As mentioned in chapter 3, projecting and unprojecting motions is an expensive

operation. As such, MC-LERP is not a cost-effective replacement for SLERP in all

cases. In cases where other operations need to be done in Motion Curve space, however,

or when multiple interpolations must be done with the same data, the cost of projection

can be amortized. As we shall see in the next section, however, MC-LERP can be used


Figure 4.2: Motion interpolation comparison: LERP is on the left and SLERP is on the

right

as a springboard to more interesting techniques that justify the computational cost of

projection.

It is important to note that MC-LERP is convex in pose space only if the full set

of bases are used. Information about joint angels that is encoded in bases that are not

included in a projection is not used in the interpolation. For example, if the weighting

scheme is organized such that the angle of the figure’s elbow joint is encoded in the

35th base, and only the first 34 bases are used during projection, the elbow joint will

assume the space’s mean position for the duration of the interpolation. The fidelity of an

interpolation is thus constrained by the fidelity of the projections involved. If all bases

are used, the joint angles of an interpolated pose should lie between the two example

poses’ joint angles.


4.2 M-way Interpolation

MC-LERP or SLERP can be used to create a blend of two poses. It is often useful to

combine more than two poses. For example, an animator might want to create a weighted

average of several emotionally charged poses to generate a continuum of expression.

One way to perform such an interpolation is to assign an order to the operations, and

proceed pairwise. This solution is problematic, because interpolation is not commutative.

The order in which the interpolations are made changes the result. In certain constrained

cases, this might be acceptable. If the source poses have distinct meanings, a logical

ordering may be possible. For example, the first interpolation may establish the result’s

position along a single parametric axis, and the second may establish a separate axis. In

the general case, however, where the number of examples, and their parametric values

are not known in advance, a consistent ordering is hard to define.

Johnson provides two separate algorithms for multi-way unit quaternion interpolation

in [37], which he calls Slime and Sasquatch. The Slime algorithm begins by linearizing

the interpolants about their sample mean. The result is then found using ordinary vector

interpolation on the linearized quaternions. This works well for range-limited joints, but

suffers from a discontinuity 180 degrees off-mean. The Sasquatch algorithm is more

general, and does not suffer from any discontinuities. It works by iteratively minimizing

an ODE representing the sum of the spherical distances between the result and the

weighted interpolants. Since it is an iterative procedure, it is not as fast as Slime.

Computationally, MC-LERP just a normal vector interpolation, so it is trivial to

extend it to multiple interpolants. The scaling behaviour is linear with the number of

interpolants.

We developed a prototype application to demonstrate the usefulness of multi-way

interpolation. This application is similar in spirit to the technique described by Igarashi

et al in [35], but it differs greatly in the underlying interpolation technique. Using the

application, the user can select single poses from motion clips. The selected poses are


represented as points on a 2D plane. The user can arrange the points in any configu-

ration that is desired by right-click dragging them. By left-clicking and specifying 2D

coordinates (X, Y ), the user can specify a set of n interpolation weights wi governed by

the following equation:

wi = ((X − dxi)2 + (Y − dyi)

2 + ε)−12 , (4.2)

where (dxi, dyi) are the ith interpolant’s planar coordinates, and ε is a small number.

These weights are expressed as percentages of their sum, and then used to create a

projected point:

Proj =n∑

i=0

wiPi. (4.3)

This point is unprojected and applied to a figure in real time. This gives the user a

puppet-like interface for specifying new poses. Figure 4.3 shows a set of example poses

with several interpolated results.

Figure 4.3: The Planar interpolation application


4.3 Improved Non-overlapping Blends

By inspection, the 3D projections of smooth motions appear to be smooth themselves.

Joining two clips that are separated in time by a linear interpolation through pose space

gives reasonable results, but suffers from a characteristic ‘over smoothed’ appearance.

One way to reduce the blending artifacts is to use a higher-order interpolation between

the clips. We use a simple cubic interpolation in these cases. We use a Hermite spline

that depends upon the derivatives and positions of the endpoints of the two clips. The

value of each interpolated point is given by

x(t) = P0(2t3 − 3t2 + 1) + P1(−2t3 + 3t2) + v(D0(t

3 − 2t2 + t) + D1(t3 − t2)), (4.4)

where v is a control variable that the animator can set between zero and one to damp

out the nonlinear portion of the interpolation.

We use a simple finite difference to estimate the derivative at the boundaries of each

clip. This can lead to problems if a clip is noisy, so smoothing should be done first to

get the best results.

In most cases, the effects of using the cubic interpolator are subtle. When working

with clips with large derivative values at their boundaries, however, the improvement is

quite noticeable. One such case is illustrated in figure 4.4. Here, the first clip portrays

the figure stepping to the left, while the second clip shows the shuffling forward into a

crouch. With a linear interpolation, the momentum of the stepping motion is lost as

soon as the interpolation begins. Cubic interpolation provides a smoother transition by

incorporating the momentum within the interpolated result. The character anticipates

the next motion as is crosses clip boundaries, without the robotic-looking transition

artifact that is characteristic to linear blends. Of course, there is no physical simulation

going on, so the figure’s ‘momentum’ in this discussion is simply a side effect of the

Hermite blend. While there is nothing guaranteeing physical plausibility in the final


Figure 4.4: Cubic interpolation in motion curve space

animation, the observation that smooth motions project as smooth curves would tend to

support the hypothesis of a cubic interpolation as leasing to a more realistic motion than

a linear interpolation.

4.4 Case study - Motion Graphs

In [45], Kovar et al. introduce an adaptation of Schodl et al.’s Video Textures [59]

to motion data, which they call Motion Graphs. Both techniques create an endless

animation by finding instances of self-similarity within a finite data stream. Cross-over

points are built at these instances, and a directed graph structure is built to represent all

possible transitions. Animations can then be produced by traversing the graph. In order

to test the features of motion curve space, we built an implementation of the motion

graphs algorithm using Motion Curves as its underlying representation.

Porting the motion graphs algorithm to a new representation requires two major

pieces of infrastructure: a distance metric, and an interpolation operator. As shown in

chapters 3 and 4, both of these are readily available in Motion Curves representation. In


addition, both operations are computationally simple, and are scalable in the sense that

extra speed can be bought at the price of accuracy.

The original motion graphs paper uses a very heavyweight pose difference calculation.

Two poses are compared by using them to deform polygonal meshes, resulting in a

pair of vertex point clouds. A closed-form optimization is then performed to find a

transformation to align the two point clouds and cancel the poses’ root transformations.

The distance is then taken to be the sum of squared displacements between corresponding

vertices. Interpolation is done using the usual quaternion-based SLERP on each joint.

Our implementation has two major components, which are embodied in two separate

programs. The first program is used to create the motion graph structure itself, and

saving the results to a file. The second program accepts the results of the first, and

presents a random walk around the graph. The walk is visualized by animating a figure,

and showing the projection of its current pose and the graph structure in Motion Curve

space.

Constructing the motion graph is a time consuming process. The algorithm accepts

as input a long motion clip. The first step is to construct a table of inter-pose distances

for the entire clip. The clip is usually smoothed and sub-sampled as a preprocessing step

in to reduce the number of poses. Once constructed, a diagonal filter is convolved with

the table to enforce causality, as described in [59]. The table is normalized, and local

minima are found. These local minima represent possible transitions.

The pose indices of the minima found in the table become vertices in a graph (built

using the Boost graph library [62]). The edges are created for each potential transition,

and for each pair of time-adjacent vertices. In this form, the motion graph may contain

dead ends, so only the strongest connected component is retained.

A screenshot showing a single moment of a random walk is shown in figure 4.5. The

random walk is performed by traversing the graph structure, and taking the natural (non-

transitional) edge most of the time. MC-LERP is used to perform the interpolation when


transitional edges are taken. While such an undirected animation serves little purpose

on its own, it demonstrates the use of several of algorithms developed in this thesis,

and is interesting to watch. Through this animation, we demonstrate the real-world

applicability of MC-LERP.

Figure 4.5: The Random Walk Demo Application

4.5 Summary

In this chapter we have shown how pose interpolation can be done in Motion Curve

space. We have subjectively compared our interpolation with the standard method, and

demonstrated how it can be easily applied to multiple blend targets. We also shown how

non-linear interpolations can be used to create smoother transitions. Finally, we put our

intepolation algorithm to use in implementing the results of Kovar et al.’s Motion Graphs

paper, and shown that it creates acceptable results.

Chapter 5

Geometric Operations

5.1 Overview

In this chapter we introduce several motion-editing operations that are made possible

by the Motion Curve space introduced in the previous chapter. Screenshots from the

prototype editing application illustrate both key concepts and the animator’s workflow.

Motions in this application are visualized ‘out of time’ as three dimensional curves.

This representation emphasizes the object-editing metaphor employed by the geometric

operations, and presents a familiar interface to animators used to working with 3D model

editing programs. Figure 5.1 shows the results of two of the operations described in this

chapter.

5.2 Finding Mean Poses

Many of the operations that we develop require reference points or directions. Such

navigational guideposts can be difficult to come by in a high-dimensional space, especially

when the basis vectors do not necessarily correspond to anything meaningful to the

animator. The task is further complicated by the fact that animator will often use more

than once space in a single editing session: any landmarks created in one space will need

54

Chapter 5. Geometric Operations 55

Figure 5.1: Translation and Scaling Operations: The left image depicts a bounded trans-

lation applied to a running motion. The figure in the foreground has been brought closer

to a ducking stance. The right image depicts a bounded scaling applied to a martial

arts move. The foreground figure’s stance has been widenedm and its hand motions

exaggerated.

to be re-projected or recalculated into the other spaces before it can be used.

Given the lack of inherent points of reference in our spaces, we construct reference

points using projected data. This most often involves finding the sample mean of one or

more sets of poses.

Our unprojected data takes the form of a list of unit quaternions, each one repre-

senting the orientation of one of the figure’s joints. Taking the arithmetic mean of the

quaternion values does not work because the results are not necessarily of unit length,

and therefore do not correspond to a rotation. Also problematic is the fact that quater-

nions Q and −Q, which correspond to the same rotation, have a null arithmetic mean.

This precludes any re-normalizing step.

Johnson gives a robust algorithm for finding the sample mean of a set of quaternions

in [37]. This algorithm, which is outlined in listing 2, reformulates the problem as a

minimization exercise. The mean quaternion is taken to be that which minimizes the

sum of squared inner products with all of the data points, subject to the constraint that

it must lie upon the unit hypersphere. This requires a large matrix multiplication, and


a 4 by 4 eigen-decomposition. In order to find the mean of a pose, this calculation has

to be repeated for each joint. We use this method during the construction of our Motion

Curve spaces.

Algorithm 2 Calculating a sample mean quaternion

1. Construct a 4xN matrix Q containing the sample quaternions in its columns

2. S = QQT

3. Perform the eigenanalysis of S

4. The mean quaternion can be taken as the eigenvector associated with the largest

eigenvalue

Once constructed, a Motion Curve space of dimensionality n behaves much like Rn.

Poses project to single points. Given the space’s closure over addition and scalar mul-

tiplication, we can can use the arithmetic mean to calculate the mean of M projected

points:

P =1

M

M∑i=0

pi. (5.1)

This projected mean can be unprojected to yield an approximation to the quaternion-

based mean pose. The quality of this approximation will depend upon the span of motion

curve axes, which in turn depends upon the choice of construction data.

Given a suitable space, this method for finding the sample mean is conceptually

straightforward and computationally cheap. It is expensive to project the data, but this

cost will be amortized over the application of several Motion Curve operations in most

use cases.


5.3 Scaling-Based Operations

In [9], Blanz and Vetter use scaling in a PCA space to generate caricatures of faces. In

their formulation, texture and segmented geometry are represented as points in separate

spaces that were derived from a database of face data. In order to generate a caricature,

the projected points are moved away from the origin by scalar multiplication. The logic

behind this is that the origin of the PCA spaces represent a zero offset from the mean face

of the constructing dataset. A caricature of a face can be thought of as an exaggeration

of those features which distinguish it from the average. Scaling the projected datapoint

increases the distance from the average, and thus the perceptual distinctiveness of the

reconstructed face.

We can perform a similar operation in Motion Curve space. Given a projected motion

clip, an exaggeration can be made by performing a simple linear scaling about the origin.

This increases the distance between the origin and every pose in the motion. Scaling a

clip by a factor less than one subdues the motion, bringing it closer to the base point.

The expressive power of this operation can be increased by subjecting it to a time-

based envelope. Instead of scaling the entire clip, the animator can instead isolate a

particular segment to exaggerate. This is done by dragging start and end markers to the

desired positions on timeline. The envelope is ramped at either end to prevent popping

artifacts in the final animation. The animator can control the lengths of both the lead

in and lead out times. The application of the gating function is shown in figure 5.2. In

our prototype, we can apply this gate to any of the geometric operations.

A further improvement to the scaling operation can be made by changing its reference

frame. Scaling about the origin increases the dissimilarity between the poses in the clip

and the origin. The origin in a Motion Curve space represents the mean pose of the

training data set. This may not always be a good choice of baseline for a motion clip.

Figure 5.3 presents an example of a bad scaling. When a motion’s projection is entirely

to one side of the mean, for example, its projected ‘center of mass’ is moved by the


G(t)

Timet0 t1

1

0

tlead-in tlead-out

Figure 5.2: The gating function and its application

translation. This may or may not be the effect that an animator would be seeking in

such an situation.

Recall that we can quickly find the projected mean of an arbitrary set of poses. If

we find the mean of the animator-selected region during the scaling operation, we can

use it as the origin for the operation. This is accomplished using a simple chain of affine

transformations. First the projected poses are translated by −Tmean, the negative of

the selection’s projected mean. Next, scaling is done as before. Finally, the poses are

translated back into position by Tmean. Figure 5.3 shows the difference between local and

global scaling.

The main advantage of using linear scaling is that it offers a wide expressive range

for very little computational cost. The entire operation, including the translations, can

be wrapped into a single matrix multiplication, which can be optimized using SIMD

instructions on modern processors. Given that motion editing is most often an offline

technique, we are free to explore more expensive options.

Making the scaling factor dependent upon a point’s distance from the base point

can lead to interesting effects. Using a shifted and scaled sigmoid function S(d) as


Global Scaling Local Scaling

Figure 5.3: Global vs Local Scaling

the distance function modifies only certain portions of the motion. By adjusting the

function’s parameters, it is possible to scale only those parts of the motion that differ

from the baseline by at least some threshold. This is similar in effect to applying several

separate, manually bounded scalings.

Complex space warps can be built by combining multiple scaling fields with exponen-

tial drop-offs. Such a field can be configured to attract motions toward certain areas of

pose space, or away from others. Using poses taken from library animations, an animator

can ‘sculpt’ a field that will bend motions toward characteristic poses. New animations

can then be ‘coloured’ by applying the field to their projections. An example of this is a

field that attracts footfall poses toward exaggerated, limping counterparts.

5.4 Translation-Based Operations

As we will see in chapter 4, Motion Curve space allows us to blend between poses using

a simple linear interpolator. Translation along a straight line can be viewed as a special

case of interpolation, where the two ends of the interpolation are the initial point and


some point located along the vector describing the translation. The translation operators

in our system work on the idea of similarity: poses can be made more or less like other

poses by linear translations through space. For example, a normal walking motion can

be translated toward a crouching pose to make a crouching walk.

The simplest translation that can be made is one along one of a space’s axis. The

effects on the figure’s pose will of course depend upon the principal poses of the current

space. These poses are not guaranteed to be meaningful to an animator, although they

can be manipulated through judicious setting of the joint weights when building a Motion

Curve space.

Indeed, it is possible to build a puppet-like interface using such a scheme. By directly

mapping a translational offset to input devices, the animator can rapidly change change

a pose (or motion). Unfortunately, degrees of freedom are quite limited on most input

devices, so the amount of control this scheme gives is limited. Still, if suitable principle

components can be developed in a space, a single input DOF will map to several correlated

joints in the final animation.

The biggest problem with using translation to modify a pose through its Motion

Curve space projection is one of navigation. It is hard to control, or even visualize, many

more than just a few degrees of freedom at once. In most general spaces, the axes will not

relate to a feature that the animator wants to control anyway. Clearly, if the animator

is to navigate Motion Curve space with any degree of effectiveness, we need to establish

a more intuitive navigational framework.

The easiest way to establish a location in Motion Curve space is not to construct

it manually, but to take it from example. Given a desired pose, it is easy to project

it into the space, and calculate the direction to it from another point. A single pose

can be made more or less like another by constructing a vector connecting the two and

translating along it.

Entire motions clips, or gated segments of clips, can be translated in a similar manner.


In this case, however, the base of the translational direction vector is the mean pose of

the set of points to be edited. Translating the segment as a whole preserves the relative

positions of its poses, such that the motion’s original character is preserved. Figure 5.4

demonstrates this process.

Figure 5.4: Translating a clip toward a target. In this case a sneaking motion is translated

toward the ‘hands-up’ pose pictured on the left.

By selecting two poses that represent two extremes in a continuum, the animator can

create an axis along which to move other clips in parallel. The advantage of this ap-

proach over translating directly toward a target is that incidental features of the original

animation are preserved. Figure 5.4 presents a comparison of the two methods.


5.5 Filtering

The notion of using signal processing techniques on motion data is not new. Bruderlin

and Williams proposed several frequency-based techniques in [12]. Using an angle-based

representation for such a task introduces many problems, as was described in chapter

3. We avoid many of the problems inherent to angle-based representations by bringing

signal processing techniques into Motion Curve space. Each dimension of the Motion

Curves data can be treated as an independent one-dimensional signal. Filters can then

be applied by convolving their kernels with the signals. The width of the kernel will

depend upon the sampling rate of the motion, and the desired effect.

One of the most useful kernels to use on motion data is the low pass filter. Motion

capture data is often plagued by high-frequency noise, which manifests as popping and

jittering upon playback. Most of the content in large-muscle human motion is quite

low-frequency, so filtering out the high-frequency noise has little visual effect. Low-pass

filtering is necessary to prevent aliasing when sub-sampling the data. Other filters are

also useful for processing motions. In [49], band pass filters are used to separate style

from content in angle-based motion data.

5.5.1 A Wavelet Approach to Smoothing

One disadvantage of using convolution to perform smoothing is that it is quite slow.

Performance can be greatly increased at the cost of some preprocessing if a multiresolution

analysis is done first. Wavelet theory is a very rich, and mathematically dense subject.

We present a very limited description of one particular type of wavelet, and defer a

thorough explanation to more authoritative sources, such as [17]

A wavelet decomposition expresses a signal as a hierarchical combination of basis

functions. There are many different basis functions to choose from, each having different

characteristics. We use the Haar basis because of its easy implementation. Despite


its simplicity, we have achieved interesting results using the Haar function. Further

exploration, using continuous bases [22] may be warranted, but it is important to note

that using more complicated bases will come at the cost of increased computation, which

may reduce the practical effectivness of the techniques we present.

The Haar basis function is described by the following formula:

h(x) =

1 if 0 < x ≤ 1

2,

−1 if 12

< x ≤ 1,

0 if x ≤ 0or1 < x.

(5.2)

The decomposition is made by recursively taking averages of adjacent data points,

and encoding the results and differences. An example decomposition is shown in figure

5.5. One requirement of the decomposition is that the signal has a length that is a power

of two. Since our motion data is most often of an arbitrary length, we perform a linear

supersampling before decomposition.

1 7 4 0 9 4 8 8 2 4 5 5 1 7 1 1

4 2 6.5 8 3 5 4 1 -3 2 2.5 0 -1 0 -3 0

3 7.25 4 2.5 1 -.75 -1 1.5

5.125 3.25 -2.125 .75

4.1875 .9375

4.1875 .9375 -2.125 .75 1 -.75 -1 1.5 -3 2 2.5 0 -1 0 -3 0

original data:

l3:

l2:

l1:

l0:

result:

average difference from average

Figure 5.5: The Haar wavelet basis function in action

Given the wavelet data, the original signal can be reconstructed by performing the


reverse operations as used in the deconstruction. Reconstruction proceeds as a series

of refinements. The level zero reconstruction is simply the first value in the wavelet

representation. This is the arithmetic mean of the signal. The level one reconstruction is

two elements long. If the level zero reconstriction is r0, and the second value in the wavelet

representation is w2, the level one reconstruction can be expressed as [ r0 − w2 r0 + w2 ].

Further levels can be reconstructed by following the pattern recursively. In this way, a

smoothed signal can reconstructed more quickly than it can be found by convolution

with a low-pass filter.

One limitation of this approach, is that it allows smoothing only at discrete levels.

When using a filter-based approach, one can simply widen the kernel, but that is not an

option with a discrete wavelet representation. One solution, which is based on a technique

used in [22], is to reconstruct two adjacent discrete levels of the curve, and perform a

linear interpolation between the two to get the continuous-level result. The first three

dimensions of a Motion Curves signal are shown at continuous levels of smoothing in

figure 5.6.

An area where wavelet representation have seen much success is in compression. Using

the algorithm presented in [65], we were able to compress motion data captured at 120Hz

up to around half of its normal size, before artifacts became visible. When the artifacts

did appear, they took the form of shakiness - the figure still performed the recorded

motions (and looked good in individual still poses), but moved with a palsy. Since the

key poses are still hit, it may be possible to avoid the artifacts through an intelligent

resampling. While compression was not a major area of research for this work, we theorize

that more impressive results could be found by leveraging the inherent dimensionality

reduction powers of the weighted PCA space in addition to the wavelet result.

Johnson did some initial work with unweighted pose PCA for compression in [37],

but found the results mixed at best, suggesting that more complex reduction techniques

that search for curved manifolds rather than orthogonal bases might provide better com-


Figure 5.6: Continuous levels of smoothing. The original motion is shown on the left.

The area highlighted in red is shown under increasing levels of smoothing in the sequence

on the right.

pression. The use of weighted PCA, where the weights are used to concentrate the more

important degrees of freedom (perceptually speaking) lower in the list of PCs, might

improve the results.

5.6 Case Study - PCA Explorer

The PCA Explorer application presents a good example of the direct application of a

geometric algorithm to Motion Curve space.

In its default mode, the application allows the user to control a puppet figure by

directly navigating the three most significant dimensions of a PCA space. This interface

is limited by two major factors. The first is that there simply is not much expressive

power in only three dimensions. Correlated joints lead to motion in more than three of

the puppet’s degrees of freedom, but there is still only three axis of control. The second

problem is that is is difficult to control three dimensions at one time using only a mouse.


The application may be set into nearest neighbour mode to help offset these limita-

tions. In this mode, an example motion clip is loaded and projected into the PCA space.

When the example motion is loaded, its three dimensional projection is used to populate

an octree structure. The closest projected point to the user’s cursor can then be found

using the standard algorithms associated with octrees. When the user moves the cursor,

the system finds the closest point in the projection, and applies the full-dimensional pose

from that point in the original clip to the puppet.

By adjusting the viewing angle, the user can find planes of control that can give

good results with only two input degrees of freedom. The motion of the puppet is

still quite limited, but the improvement over the default direct manipulation scheme is

impressive. The expressiveness of the system could be easily increased by using the M-

way interpolation technique described in 4 instead of a direct projected nearest-neighbour

query. In chapter 6, we will expand upon the idea of using octrees to perform queries in

Motion Curve space when we use the approximate nearest neighbour algorithm, which

is a refinement of kd-trees.

A possible extension to this interface that would enhance interactive performance

would be to use a predictive feedback system similar to that used in [47]. This would

allow the user more freedom in exploring the space of poses around the figure’s current

position without necessarily affecting the resulting animation.

5.7 Extensions: Joint Limits and Selective Blending

When applying any of the operations discussed in this chapter, it is quite easy to cause

the figure to adopt an unnatural posture. Posture errors can take the form of inter-

penetrations, broken foot contacts, and hyper-extended joints. While the current system

favours flexibility over realism, and as such does not attempt to fix any of these problems,

the framework could be extended to support clean-up as a post-processing step.


Inter-penetrations occur when one part of the model moved by the motion data inter-

sects with another part. For example, the figure’s hands might collide with each other.

Detecting such collisions requires knowledge of the geometry moved by the motion data.

There has been much research in motion planning for articulated figures to avoid collisions

in the robotics and simulation field [57, 42, 53].

Broken foot contacts can make the figure appear to float above the ground, or even

descend into it. Currently, the system can be set to move the figure vertically until its

lowest joint touches the ground. This is a quick and easy method that works in most

situations, but has some problems. The most glaring issue is that the figure is unable to

jump. Certain extreme poses, such as if the motion capture actor were to reach down

below its feet while standing on a non-modeled step, would be rendered incorrectly.

Some of these issues could be resolved using physical simulation, but that lies outside

the current scope of the Motion Curves representation.

Mathematically, a quaternion-based joint in an articulated figure can assume any ro-

tational value. No naturally-occurring joint has such freedom, however. Even a shoulder

joint has clearly determined limits. Many commercial animation packages support the

concept of joint limits, where joints can have arbitrary restrictions placed upon them

to limit their range of movement [2]. The actual values of the limits can be based on

anatomic data, or automatically calculated from reference motions [30]. While this can be

helpful to maintain physically realistic motion, animators often choose to go ‘off-model’

to get certain effects. Given the gross over-extensions that navigating motion-curve space

can create, however, toggleable limits would probably be useful. A more useful feature

might be a pose inspector that gave warnings on invalid rotations without limiting their

movement.

Currently, all edits made in Motion Curve space are applied to the entire skeleton. By

design, the weighted PCA procedure finds correlations between the joints’ movements,

and these correlations end up being reflected in the principal components. This is nor-


mally advantageous: it multiplies the expressive power of each input degree of freedom

in a principled way. Changing a degree of freedom that moves one major joint will often

effect sympathetic joints (as defined by the training data) ‘for free’. To get a specific

pose or motion, however, an animator might want to intentionally break these discovered

correlations.

Under the current system, this is not possible. It would be a useful extension to

allow the animator to freeze joints that are in the correct position, and protect them

from the effects further edits. An alternate solution would be to segment the body

into major regions (such as limbs), and train individual (but smaller) wPCA spaces for

each. This would allow for the body parts represented in each space to be moved totally

independently. The trade-off in this situation would be that the animator would receive

very little ‘for free’ from correlation, and at least one value for each segment would have

to be specified to define a full pose.

5.8 Summary

In this chapter we have detailed several geometric operations that can be used to edit

motions projected into a Motion Curve space. An efficient method to calculate an ap-

proximation to the mean of a set of poses was presented first. This result was used in

the development of useful operators based on scaling and translation. Next, we described

filtering operations in Motion Curve space, and showed the use of the Harr wavelet trans-

form for quick smoothing and compression. We concluded the discussion by pointing out

some possibly useful extensions to the technique. We will see some of the techniques

outlined in this chapter put to use in the chapter 6.

Chapter 6

Unsegmented Motion Searching

In this chapter, we present a new method for searching long motions for regions of

high similarity to a shorter, query motions. We begin by providing motivation for the

search. Then we present our search algorithm and an overview of the system, followed by

detailed descriptions of each of its components. We finish the chapter by presenting an

experimental evaluation of the system, and highlighting areas for future research. This

work appears in [24].

6.1 Introduction and Motivation

As the corpus of information regarding virtually every human endeavour grows exponen-

tially, the importance of computer-based indexing and searching becomes correspondingly

important. The ubiquity of the Web could not have occurred without the coincident rise

of the search engine. There is little value in information unless one can explore it. Once

a collection of data grows to a certain size, its index becomes almost as important as its

content.

In some ways, designing a search algorithm for web pages is easy. The Web is primarily

text-based, and comes pre-packaged in a structural mark-up language. Other forms of

information are not such easy targets. Recently, there has been much academic interest

69

Chapter 6. Unsegmented Motion Searching 70

in search engines for non-textual media. Search algorithms and heuristics exist for most

common media, such as still images, video, and audio. These types of data have received

the most attention because of their ubiquity. In an era of cheap digital cameras and

considerable disk storage, even individual consumers are starting to require some kind of

media indexing solution.

Digitized motion data is expensive to create and manipulate. Its creation requires

the talents of a skilled animator using specialized software, or exotic and finicky motion

capture hardware. The motion data that goes into the production of a feature animation

represents an investment of millions of dollars. As an animation studio accumulates more

such data, it is in its best interest to leverage this investment, or at least try to. In order

to do so, however, they need an efficient way to search the data.

Animators often save time when creating new animations by working from prior ex-

amples. It is often more productive to modify a walk cycle to match the requirements

of a particular situation than to start from scratch on every scene. As individual pro-

duction studios accumulate 3D character animation, the possibilities for motion reuse

at once grow and diminish. Reuse becomes potentially more fruitful, since there are

more examples to choose from, but the act of actually finding useful clips gets consid-

erably more difficult. Motions, whether they are key-framed or motion-captured, are

high-dimensional objects that are hard to compare numerically. Two motions that look

similar to a human observer may in fact be numerically very dissimilar using certain

representational schemes. In most cases, searching though a catalog for a particular type

of motion quickly becomes an exercise in patience, memory, and hard work. Clearly,

a method for quickly searching a database of motions is a prerequisite for large-scale

motion reuse. Furthermore, it is important to develop similarity measures that can be

more readily adapted to user needs.

We present a method for querying a skeletal motion database with example clips. The

motion database is constructed from one or more long motion sequences. These sequences


can be taken from previously finished animations, unsegmented motion capture trials, or

manually keyframed motion tests. All motions must be expressed over the same skeleton.

As a preprocessing step, all sequences in the database are re-sampled to a uniform rate,

and spliced together to form a single long motion. The database is queried with a short

example clip, which ideally expresses one distinct motion, such as a single reach, step,

punch, or jump. The search algorithm finds the subsegments of the database which are

most similar to the query, subject to a nonlinear time warping. These subsegments are

ranked and returned as the search results.

6.2 Algorithm

Given our motion representation and pose distance metric, we will now describe our

motion search strategy. Our search application is similarity-based. This means that the

user must have an existing motion clip with which to query the system. This clip could

be from a library of pre-segmented, canonical actions, the results of a previous query, or

even from real-time motion capture. As a preprocessing step, both the query clip, and

the database are projected into a user-specified wPCA space. Projecting the database

is an expensive operation, but it only needs to be done once for each wPCA space, and

the results can be placed in permanent storage. After the query clip is projected, its

characteristic pose is found. An efficient spatial sorting data structure is then used to

find the indices of all similar poses in the database. These indices are clustered to reduce

redundancy, and then a variant of the dynamic time warping algorithm is used to warp

the database subregions surrounding the cluster means to match the query clip as closely

as possible. The resulting warps are ranked according to fit, and returned as the search

results. We will now describe each step in this process in detail. Pseudocode for the

querying operation is given in algorithm 3.


Algorithm 3 Performing a Query

Require: projected database and query clip, and offset to characteristic pose in query

Perform an Approximate Nearest Neighbour search query with the characteristic pose

for all ANN results r do

if r can be joined with an existing cluster c then

Grow cluster c, join with neighbours if necessary

else

Create a new cluster initialized with r

end if

end for

for all Cluster min points do

Calculate the forward and backward distance tables

Find the min forward and backward paths

Join the two half paths

calculate the mean warp distance

end for

return the sorted warp paths


6.2.1 Finding the Characteristic Point

Queries represent single, coherent motions. Such motions can often be expressed using

single poses [51]. We call these poses characteristic points, and we will use them as

starting points in our motion search. First, however, we must come up with a workable

definition of “characteristic”.

A good characteristic point for a punching motion would be the moment of maximum

arm extension. All punching motions contain some element of arm extension. Most

verb-level action descriptions, such as stepping, jumping, or ducking imply some similar

common element. In a step motion, the characteristic point could be moment when the

legs are farthest apart. The characteristic point of a jump might be at its apex. Likewise,

a ducking motion might be characterized by its lowest point. The common thread in all of

these examples is that the characteristic point represents a moment of maximal extension

or deviation from some neutral pose.

This concept fits well with our motion representation. If we define the neutral pose

to be some point in the wPCA space, we can find the characteristic point with respect

to that point by searching for the most distant pose from that point. The neutral pose

can be defined in any number of ways. If the action phase of the query is proportionally

short, the mean pose of the whole motion provides a good approximation. If the query

is nicely segmented, a reasonable assumption may be that the subject begins and ends

in a neutral pose. Either boundary pose could be used directly, or alternately the mean

of the two could be taken. The origin of the PCA space represents the mean pose of the

(probably significantly longer) motion used in its creation, so it can also be used as the

neutral point.

Our objective in choosing a characteristic point is to find class of poses that is guar-

anteed to have a close analogue in all possible matching motions, but is unlikely to exist

in non-matches. If the pose is too common, spurious matches will drown out the actual

results during the next step of the algorithm. For this reason, finding a suitable charac-


teristic point is a crucial task. Of course, if the query is quite short, it is not unreasonable

to require the user to specify a characteristic point directly. For certain types of queries,

this gives better results than the automatic methods.

6.2.2 Generating Seed Points

An exhaustive solution to our search problem would be to use DTW to rank all possible

alignments of the query clip and the database. This is analogous to the technique in

[44], but is slow because the DTW operation is expensive. In order to provide interactive

response rates to the user, we must cull the search space before the DTW step. We refer

to this culling as finding the seed points in the database. Seed points are the indices of

poses in the database that are similar to the characteristic point of the query.

Our measure of similarity is the euclidean distance within the scaled wPCA space, so

we can use algorithms from computational geometry to speed our search. We also have

to choose the number of dimensions within which we will operate. The weighting scheme

used to construct the wPCA space greatly influences the results of a search, which can

be exploited to considerable advantage in searching selectively. The weights should be

picked by the user to reflect the constraints of the animation for which s/he is searching.

There are several different search structures that would work for our implementation.

We chose to use Approximate Nearest Neighbour search because of its quick running

time, flexibility, and readily available source code [52]. The effects of varying the various

parameters of the ANN software are discussed in the results section.

6.2.3 Seed Point Clustering

Motion in the database takes the form of contiguous, time-ordered strands of pose-points.

Nearest neighbour searches within such a context result in sequences of temporally ad-

jacent points. Since we will be subjecting the seed points to the DTW algorithm in the

next step, all of these points will return valid, yet similar results. We cluster the seed


points in order to avoid overwhelming the user with hundreds of very similar results, and

to improve the search’s run time. The clustering is performed on the seed points’ time

indices. This data, since it is one dimensional, integer-valued, and mostly sequential, is

very well-behaved and easy to cluster. Data points are simply collected into contiguous

(to within a noise term) intervals as they are found. The final results are given as the

closest points within each cluster.

6.2.4 Dynamic Time Warping

The query signal has a well-defined start and end point, but we have no such luxury

when looking for a subsequence within the database. The search is further complicated

by the fact that motions tend to be performed slightly differently each time they occur.

Changes in motion timing which are subtle to a human observer may cause an enormous

numerical difference.

Both of these issues are surmounted through the use of dynamic time warping. As

discussed in chapter 2, DTW is a signal processing technique that finds a non-linear

alignment minimizing the error between two signals. It returns a time displacement

function that compresses and dilates one of the functions to match the other. DTW

has been used extensively on sound signals for speech recognition, and is often used to

improve the interpolation of multiple motion clips [12, 43].

Each clustered index represents a single moment of similarity between some point in

the database, and the characteristic point of the query. A valid time warp must pass

through this point. The warp is also constrained to run from the beginning to the end

of the query. These constraints are visualized in figure 6.1.

We can divide the time warp into two subproblems: one running forward in time and

one running backward. The method for solving each subproblem is identical. First, a

distance table is computed involving the pertinent half of the query and the corresponding

section of the database. A limit imposed on slope of the warp line through the distance


Database Motion

Qu

ery

Mo

tio

n

Figure 6.1: The DTW constraints. The warp must pass through the intersection of

the characteristic and seed points, and contact both horizontal edges of the table. The

search is constrained by causality, so distances in the shaded areas of the table need not

be calculated.

table provides a bound on the size of the distance table. Starting from the characteristic

point, each cell in the table is filled with the sum of the pose distances between the

indexed animation frames and the minimum of its previously filled neighbours. Once the

table is filled, the minimum value along the query’s boundary frame is found. The DTW

path is then found by greedily searching through the table toward the characteristic

point. This search is subject to causality, so there are only at most three possible steps

to take at any given point. The slope limit is enforced to prevent degenerate warps.

Degeneracies in the warp are still possible around the characteristic point, but these can

be culled during the results ranking.

When the characteristic point is in the middle of the query, splitting the DTW into two

problems halves the number of distance calculations required. There are several methods

available to further reduce the number of calculations, and/or improve the warp quality

[16, 41]. Our pose distance metric is quick enough that this has not been necessary to

achieve interactive rates with the test data that we have used. Another desirable feature

of the distance metric is that it is possible to trade accuracy for speed, and use less than

its full dimensionality in the calculation.


6.2.5 Results Ranking

The time warps must be scored before they can be returned as ranked search results.

Any of a number of motion distance measures can be used. We define the final score of

a warp to be the average pose distance of each cell in its path. This measure does not

penalize warping, so it is more forgiving of timing differences in the results. Alternative

measures could take into account the effects of outliers along the path, or put a premium

on time distortions. It may also be useful to cull results that have large degeneracies

about the critical point, or at least penalize them so that they are lower ranked.

6.2.6 Interface

Our implementation consists of several linked applications. Space weights are specified

in XML files, and wPCA spaces are created using a command line utility. The search

algorithm itself is implemented as a GUI-based application. The user begins a session by

loading database and query clips. A PCA space description file must also be loaded, and

further spaces can be loaded on the fly afterward. The user can preview the query clip

by scrubbing along a timeline. After setting the search parameters, the user can perform

a search by clicking the search button. Results are displayed as a list of offset times,

sorted by score. The user can see a side-by-side comparison of any result with the query

by selecting it and scrubbing along the timeline.

6.3 Results

6.3.1 Synthetic Data

We first used synthetically-generated data to verify the functioning of our system. This

allowed us produce clean motion clips with controlled variations in movement parameters.

We used the physically-based animation system designed by Neff and Fiume to generate


this data procedurally [55].

Two synthetic motion sets were generated. The first is approximately 300 seconds

long, sampled at 50 fps. The figure begins by raising its right arm 15 times. The exact

position of each raised hand was selected from a 10cm3 cube using a uniform random

distribution. The posture of the figure was randomly set along the Alberts axis from .25

to .75 [55]. Finally, the overall timing of each motion was scaled, ranging from .7 to 1.5

times the normal length. The figure then performs 20 similarly varied left arm raises, and

20 double-arm-raises. It finishes by performing 10 identical shrugs at different speeds,

and 10 slouches. The second dataset only contains arm raises, but the bounds for the

arm targets are increased in the x and y directions by a factor of three.

Neff and Fiume’s animation system uses an SD-Fast derived physical model [33].

Using the measurements provided in the SD system definition file, a similar skeleton

definition was created. The mass definitions from the file SD file were used to set the

wPCA weights: each joint was weighted with the amount of mass under it in the skeleton

hierarchy. No range of motion trial was available to train the PCA space, so we used the

longer of the two samples.

A 3D projection of the long motion clip is shown in figure 6.2. The individual motion

classes stand out as the path extremities. The sample mean of the training data is the

same as the rest position in this clip, and is represented by the cluster of points at the

origin. More complicated motions embedded within spaces generated from richer training

data take on a much less angular appearance when projected in three dimensions. This

is consistent with the fact that the synthetic data was designed to have a low inherent

dimensionality.

6.3.2 Validation

Validation was performed using the synthetic data. The first of each type of arm raising

motion was manually segmented from the longer motion clip, and then used to query


Figure 6.2: PCA projection of synthetic motion, showing low inherent dimensionality.

both motion clips. The quality of the results of the queries were highly dependent upon

the characteristic point used, and the size of the initial ANN search.

The automatic characteristic point finder did not work well with the arm movements,

because the time that the hands are raised is very long in relation to the length of the

whole clip. This shifted the clip’s mean point away from the rest position, and put the

characteristic point down near the rest pose. Using the rest pose as the query point in the

ANN search lead to mostly spurious results, with the time warping algorithm left trying

to match essentially random segments of the database to the query. Manually setting the

characteristic point to the moment of maximum arm extension and re-querying allowed

the algorithm to proceed as designed.

The size of the initial ANN search determines the broadness of the results. The

structure of the motion data is such that a spatial proximity search returns long chains

of time-adjacent points for any query. We simplify the results in the clustering phase,


but in order to find all valid target clusters, the initial search must be set to return a

very large number of points. For example, when querying the long clip with the raised

left hand motion, the ANN search must be widened to 1130 points before all of the raised

left hand targets are found. If the search is widened even further, other types of motions

start to creep in to the results. At 1900 points, several instances of the raise both hands

action are returned after all of the raise right hands. This is good behaviour, since the

‘raising both hands’ motion is perceptually closer to raising only the right hand than

any other action in the database. The motion distance measure also holds up, since it

consistently ranks the second tier of matches below the actual matches.

Interestingly, querying the database with one of its own motions does not always most

highly rank that motion. This is due to several factors. During the manual segmentation

of the query clips, the designated motion range is re-sampled, introducing small numerical

differences. With synthetic data, the figure holds absolutely static poses at several points

during its performance. These quiescent points tend to occur at the characteristic points,

so the ANN search often finds several identically closest points. The clustering technique

has no way to distinguish among these points, so it picks the first. This causes some

noise in the alignment of the database ranges with the query clip, which is corrected by

the time warping step. This correction has a non-zero cost, so self-matching does not

necessarily hold.

6.3.3 Motion Capture Data

After verifying the system with synthetic data, we tested its real-world applicability

with motion capture data. The data was collected using a Vicon optical motion capture

system, and post-processed into joint angles using the Vicon IQ software. This data uses

a different skeleton than the synthetic data, having the same dimensionality (19 joints),

but a different arrangement (see figure 6.3). The Vicon samples at 120 Hz.

Many individual motion trials were captured, with each being on the order of a couple


Vicon Real-Time Skeleton SD-Fast Skeleton

Figure 6.3: The two skeletal structures used.

minutes in length. Specific range of motion trials were made to train the PCA spaces.

Trials included walking, cyclic motions, martial arts moves, and others. As with the

synthetic data, query clips were manually segmented from the longer trials. An example

projection is shown in figure 6.4.

Searching with the real data worked well, but the results were not as clean as those

from the synthetic tests. An illustrative example is a search done using recorded Aikido

movements. The actor in the Aikido motion trial performed a specific script. By tinkering

with the location of the characteristic point, it is possible to cause the system to return

movements in the wrong order. The matches are still consistent, however, with the bodies

being in similar, if not identical poses. The ranking that our search method provides is

somewhat arbitrary, much like the ranking of web pages returned by most web search

portals.

Differently-weighted PCA spaces can be used to modify the results. Re-doing the pre-

vious example with a weighting scheme that emphasizes the arms improves the ranking

of the results. This is because the position of the arms is what most clearly differenti-


Figure 6.4: wPCA space projection of a martial arts move.

ates the moves. An animator can use different spaces to accomplish specific goals. For

example, when animating a character picking up an object, it would be a good idea to

use a space weighted heavily toward the hand the character is using.

6.3.4 Scalability

One of the most important features of a search algorithm is its scaling behaviour. In this

section we discuss the theoretical complexity of our algorithm and present some search

heuristics. We then provide experimental results that demonstrate the efficacy of these

techniques.

The complexity of the overall search algorithm is best evaluated in individual sections.

The complexity of the ANN k-nearest neighbour search in d dimensions over n points

and with error bound ε is shown to be O((cd,ε + kd) log n) in [3], where cd,ε is a constant

dependent upon d and ε. This gives good performance with large databases, but is

quite sensitive to dimensionality. The clustering technique that we used has a worst-case

complexity of r(r+1)2

, and returns c clusters, where r is number of results returned by the


Length (s) Num. Results Query Time (ms)

54 46 580

100 21 263

130 30 662

230 40 854

300 41 877

440 25 534

670 27 580

Table 6.1: Query Time vs Database Length

ANN search. Each application of our DTW algorithm requires at most sq2 and at least

(sq2)2

pose distance calculations to build the distance table, and 3s comparisons to trace

a path through the table (where q is the length of the query and s is the slope limit).

Sorting the c warps takes c log c comparisons. The aggregate complexity of all of the

steps is

(cd,ε + kd) log n +r(r + 1)

2+ c(sq2 + 3s + log c).

We tested our system with motion capture databases of various sizes to experimentally

evaluate its scalability. The results of the tests are given in table 6.1.

The data shows that our implementation’s real-world performance largely depends

upon the number of results passed on to the DTW step, rather than the size of the

database. The databases used for the last two trials contained exact matches for the

query. It follows that the area around the characteristic point would be dense with

consecutive poses, which would lead to more clustering, and fewer returned results. This

emphasizes the importance of a having both a unique characteristic point and a good

clustering result. It also indicates that superior performance can be gained by adjusting

the DTW parameters. All tests in this paper were performed using a 3GHz Pentium 4

computer.


Data Sampling rate 120 Hz

Database length 70 seconds

Query Length 2.8 seconds

ANN dimensionality 10

ANN search size 1000

ANN epsilon 0

DTW dimensionality 5

Table 6.2: Baseline Parameters

6.3.5 Performance Optimization

There are several ways to improve the running time of the algorithm. Most involve a

trade-off of either search breadth or accuracy for query time performance. In order to

build a basis for comparison, we set up what could be considered an average query. We

used the Aikido data described in the previous section, and the search parameters listed

in table 6.2. Using the automatically generated characteristic point, the average query

time (from 10 trials) was 347 ms, and the search returned 12 results. Using a manually

specified characteristic point, the average query time was 130 ms, with 9 results returned.

The results from the manual characteristic point trial were subjectively much closer to

the query than those from the automatic trial, so the manual point was used in all

subsequent trials.

Pre-smoothing the data

The Vicon system samples motion at a default rate of 120 Hz–a much higher rate than is

necessary to capture most large muscle movements. Reducing the sampling rate obviouly

reduces search time. We resampled the database and queried at a progression of sampling

rates to illustrate the effects of this reduction on average query time. The results are

shown in table 6.3. As expected, using fewer samples greatly speeds up the algorithm.


Hz Num. Results Query Time (ms)

233 9 453

166 10 158

58 9 29

29 9 7

Table 6.3: Query Time vs Sampling Rate

The quality of the results is consistent until the low sampling rate conflicts with the

clustering algorithm. At very low rates, the gaps between clusters all get filled, and the

system fails by returning a single result.

Adjusting the ANN parameters

The scalability results indicate that the performance barrier in the system lies with the

DTW stage. That being said, an investigation of the effects of the ANN parameters is

important, if only just to verify the earlier result.

The number of results returned by the ANN search affects both its own running

time, and the number of final results after the clustering step. Using the default setup

as a baseline, we varied the number of neighbours to be returned by the ANN search.

The results are given in table 6.4. With a small number of neighbours, many potential

results are missing. As the number of neighbours increases, there tend to be more results

returned. After a point, the seed points’ neighbourhoods grow too large, and incorporate

spurious points, which in turn causes the clustering algorithm to create improper clusters.

The ‘sweet spot’ between too few results and over-clustering depends upon the specific

nature of the database and query being used.

One of the most desirable features of ANN search is that it can deliver improved

performance if some measure of error is acceptable. Paradoxically, increasing this error

tolerance leads to reduced performance in our system. Non-exact results cause gaps in


Neighbours Num. Results Query Time (ms)

100 4 54

250 5 70

360 6 80

490 8 109

600 8 118

1000 9 132

1500 9 137

2500 12 193

3010 17 256

5000 12 219

Table 6.4: Query Time vs Neighbours

the runs of nearest neighbour poses, which leads to an increased number of clusters.

Adjusting the dimensionality of the data used for the ANN search had a negligible effect

upon the running time of the overall system.

Adjusting the DTW parameters

The running time of the search algorithm directly corresponds with the number of clus-

tered results that make it to the DTW phase, so the time warping algorithm is good area

upon which to focus optimizations. An easy optimization is to exploit the flexibility of

our motion representation by reducing the dimensionality of the pose distance metric. As

the shown in table 6.5, reducing the accuracy of the distance comparisons does improve

the system’s run time. For the types of motions tested, the quality of the warp is not

subjectively affected by reducing the dimensionality. This of course depends upon the

distribution of the principal components. The user is required to pick a set of weighted

PCA bases that reflect the content of the motions that s/he is using.


Dimensions Num Results Query Time (ms)

1 9 70

3 9 101

5 9 130

20 9 200

54 9 372

Table 6.5: Query Time vs DTW Dimensions

6.4 Summary

In this chapter we have presented a search algorithm for use with sampled motion data.

In doing so we have also developed a representation for motion data that introduces a

meaningful distance metric for poses. We have shown how an animator can control the

properties of the wPCA space through its weights, and how this may be used to direct

the search results. We have demonstrated the use of the search algorithm on both real

and synthetic data, and have analyzed its performance. Finally, we have experimented

with the algorithm’s settings in order to gauge its scalability to large databases. In our

tests, the algorithm returned both examples of the same motion as the query clip, and

different motions with similar features. This algorithm produces reasonable results in

very little time, allowing a user to quickly locate clips with which to work.

Chapter 7

Conclusion and Future Work

7.1 Conclusion

This thesis has introduced the Motion Curve representation for samples motion data.

We presented a detailed description of the steps for constructing a motion curve space,

taking care to show how it relates to existing motion representation schemes. We explored

the features of the space, highlighting its strong points, but also pointing out potential

weaknesses.

Through this exploration of motion curve space, several new motion editing techniques

were introduced. The most serendipitous, perhaps, was the pose detection algorithm

that motivated the development of the space in the first place. While investigating the

properties of projected interpolation, we happened upon an efficient and flexible method

for blending more than two poses. An automatic motion segmentation algorithm was

hinted at by some of our work with geometric operators, although we chose not to flesh

it out, in favour of doing work on other aspects of the research problems. The motion-

editing geometric operations described in chapter 5 provide a compelling tool set for

modifying existing motion clips to suit new needs, or to adjust their subjective qualities.

In creating new techniques for editing motion data, we have expanded the number

88

Chapter 7. Conclusion and Future Work 89

of tools at the disposal of animators. The overarching goal of all of the tools that we

have developed is to provide the means to quickly modify exiting motions to create new

motions. It is our hope that this will allow animators to rapidly prototype motions

without resorting to traditional motion creation techniques.

We have also shown the usefulness of having an interpolating embedding space for

motion data when developing automatic processing algorithms. The simplified interpo-

lation scheme that we presented in chapter 4 opens the door for new and interesting

applications, such as the interpolated plane interface that we demonstrated. Having a

well-defined pose distance metric also facilitates automatic processing, as shown by the

success of the search algorithm of chapter 6.

7.2 Future work

No thesis is complete without a discussion of further work that could be done and refine-

ments that could be made to the existing work. Indeed, there are several areas where

this work could be continued, ranging from trivial extensions to more fundamental en-

hancements.

7.2.1 Representation

Principal components analysis was chosen as the basis for the motion curves projection

because of its simplicity, and because of its existing uses in the literature. PCA projection

is a linear operation, so degree to which it can compress non-linear structures embedded

within the data is limited. This thesis did not dwell upon the use of PCA for compression,

in part at least because of initially lackluster results. Using a different change of basis,

however, might provide results worth pursuing.

Independent Component Analysis (ICA) is a linear projection technique, like PCA

[34]. The difference is that ICA seeks to find bases that are as statistically independent


as possible. This may or may not be of benefit for the techniques described in this thesis.

Often, controlling more than one correlated factor in an animation is a desired effect.

Nonlinear dimensionality reduction techniques fit reduced spaces to curved structures

in original data. An example of such a technique is IsoMap [67]. Most motions projected

into a PCA space exhibit non-linear structure, so they could, in theory, be characterized

quite well by such a technique. This would be useful in building models of particular

classes of repeated motions, but not as well with a heterogeneous mix of motions. Non-

linear dimensionality reduction could be done on the original linearized motion data, or

after wPCA projection, in order to take advantage of the weighting scheme.

7.2.2 New Operators

The collection of operators presented in chapter 5 is by no means exhaustive. Nonlin-

ear operators, such as distance-based translations, could be used to create caricatured

motions. The biggest problem with implementing complex operations is in creating an

intuitive user interface for actually using them.

An example of an easy to implement and usable tool that could be added is an

interactive speed adjuster. The speed at which a motion is performed is determined by the

apparent density of samples along the projected motion curve. This could be displayed to

the animator using a colour coding scheme, where brighter colours correspond to faster

movements. Then, by using a painting analogy, the animator could specify re-sampling

rates along the curve to adjust the performance speed.

7.2.3 Search Refinements

Our search system was designed to be integrated into a larger motion editing system

utilizing our pose space representation. In this context, the search system can be used

to find motions that are similar to a specified clip, so that fine adjustments can be made

via interpolation. It can also be useful as a motion exploration tool. A clip created using


the various editing tools can be used as a query in order to find a similar, but more

realistic motion. The time warping code prototyped for the search system is also useful

for improving the quality of arbitrary interpolations.

A possible use of the system would be to quickly apply markup to a large motion

database. This procedure would start with a small set of manually marked-up clips. The

mark-up would take the form of a 〈descriptor,value〉 tuples, where descriptor describes

the motion’s action, and value indicates how well the descriptor fits. For example, if

the descriptor is ‘step forward’, an unambiguous step forward might have the value of

1, while a ‘step forward and to the left’ would have a lower value. Queries could be

performed using each of the motions in the marked-up set. The ranges that are returned

from each query would then take on the descriptors of the queries, with values set to be

a function of the query’s values and the result’s ranking score. After mark-up, semantic

queries could be made to database very quickly.

The system could be extended to work with more complicated motions by adding

support for more than one characteristic point. If a query were determined to have mul-

tiple characteristic points, the modified algorithm would start by finding and clustering

seed points for them all. The dynamic time warping phase of the algorithm would have

to be modified to take into account multiple constraints. It would work by finding seed

points in the same order as their corresponding characteristic points. Warps would then

be found between adjacent matching pairs. The adjacency information could then be

represented as a directed graph, and all possible traversing paths enumerated.

7.2.4 Software Development

The various prototype applications developed over the course of this project were designed

with an eye toward their eventual integration. The usefulness of the techniques that they

embody would be greatly enhanced by a synergistic framework. For example, an animator

could create a pose using planar interpolation, then use it as a search key or blend target


without changing programs.

Most of the actual computation involved in the prototypes is done deep within com-

mon base classes. Most of programs use a similar set of structures for tracking their

operation. The most disparate component is the user interface. In order to bring the

prototypes together, a common interface standard would have to be created. Alternately,

the base classes could be compiled into a plugin for an existing motion editing suite, such

as Alias’ Maya. It may be conceptually difficult to fit motion curves into an exiting

framework, but if it could be done without too much contortion of the framework design-

ers’ original intentions, it would be the quickest way to move the techniques described in

this thesis into a production environment.

Bibliography

[1] Marc Alexa and Wolfgang Muller. Representing animations by principal compo-

nents. Comput. Graph. Forum, 19(3), 2000.

[2] Alias. Maya unlimited 7.0.

[3] Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and An-

gela Wu. An optimal algorithm for approximate nearest neighbor searching. In

SODA ’94: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete

Algorithms, pages 573–582, Philadelphia, PA, USA, 1994. Society for Industrial and

Applied Mathematics.

[4] David Baraff and Andrew Witkin. Physically based modeling: Principles and prac-

tice, 1997.

[5] Jernej Barbic, Alla Safonova, Jia-Yu Pan, Christos Faloutsos, Jessica K. Hodgins,

and Nancy S. Pollard. Segmenting motion capture data into distinct behaviors. In

GI ’04: Proceedings of the 2004 conference on Graphics Interface, pages 185–194.

Canadian Human-Computer Communications Society, 2004.

[6] D.A. Becker. Sensei: A real-time recognition, feedback, and training system for t’ai

chi gestures. In Vismod, 1997.

93

Bibliography 94

[7] D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time

series. In KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pages

359–370, July 1994.

[8] Christopher M. Bishop. Neural networks for pattern recognition. Oxford University

Press, Oxford, UK, UK, 1996.

[9] Volker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces.

In SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graph-

ics and interactive techniques, pages 187–194, New York, NY, USA, 1999. ACM

Press/Addison-Wesley Publishing Co.

[10] Aaron F. Bobick and Andrew D. Wilson. A state-based approach to the repre-

sentation and recognition of gesture. IEEE Trans. Pattern Anal. Mach. Intell.,

19(12):1325–1337, 1997.

[11] Matthew Brand and Aaron Hertzmann. Style machines. In Kurt Akeley, editor,

Siggraph 2000, Computer Graphics Proceedings, pages 183–192. ACM Press / ACM

SIGGRAPH / Addison Wesley Longman, 2000.

[12] Armin Bruderlin and Lance Williams. Motion signal processing. In SIGGRAPH,

pages 97–104, 1995.

[13] L. W. Campbell, D. A. Becker, A. Azarbayejani, A. F. Bobick, and A. Pentland.

Invariant features for 3-d gesture recognition. In FG ’96: Proceedings of the 2nd

International Conference on Automatic Face and Gesture Recognition (FG ’96),

page 157, Washington, DC, USA, 1996. IEEE Computer Society.

[14] Marc Cardle, Michalis Vlachos, Stephen Brooks, Eamonn Keogh, and Dimitrios

Gunopulos. Fast motion capture matching with replicated motion editing. In SIG-

GRAPH 2003, Sketches and Applications. ACM Press, jul 2003.

Bibliography 95

[15] Naval Undersea Warfare Center. Gaussian mixtures / hmm toolkit for matlab.

http://www.npt.nuwc.navy.mil/Csf/.

[16] Selina Chu, Eamonn J. Keogh, David Hart, and Michael J. Pazzani. Iterative deep-

ening dynamic time warping for time series. In SDM, 2002.

[17] Charles K. Chui. An Introduction to Wavelets. Academic Press, 1992.

[18] James W. Davis and Aaron F. Bobick. The representation and recognition of hu-

man movement using temporal templates. In CVPR ’97: Proceedings of the 1997

Conference on Computer Vision and Pattern Recognition (CVPR ’97), page 928,

Washington, DC, USA, 1997. IEEE Computer Society.

[19] Mira Dontcheva, Gary Yngve, and Zoran Popovic. Layered acting for character

animation. ACM Trans. Graph., 22(3):409–416, 2003.

[20] Petros Faloutsos, Michiel van de Panne, and Demetri Terzopoulos. Composable

controllers for physics-based character animation. In SIGGRAPH ’01: Proceedings of

the 28th annual conference on Computer graphics and interactive techniques, pages

251–260, New York, NY, USA, 2001. ACM Press.

[21] Petros Faloutsos, Michiel van de Panne, and Demetri Terzopoulos. The virtual stunt-

man: dynamic characters with a repertoire of autonomous motor skills. Computers

& Graphics, 25(6):933–953, 2001.

[22] Adam Finkelstein and David H. Salesin. Multiresolution curves. In Proceedings of

SIGGRAPH 94, pages 261–268, July 1994.

[23] Kevin Forbes. Summarizing motion in video sequences.

http://thekrf.com/projects/motionsummary/, 2004.

[24] Kevin Forbes and Eugene Fiume. An efficient search algorithm for motion data using

weighted pca. In SCA ’05: Proceedings of the 2003 ACM SIGGRAPH/Eurographics

Bibliography 96

Symposium on Computer animation, Aire-la-Ville, Switzerland, Switzerland, 2005.

Eurographics Association.

[25] Pascal Glardon, Ronan Boulic, and Daniel Thalmann. Pca-based walking engine

using motion capture data. In Computer Graphics International, pages 292–298,

2004.

[26] Michael Gleicher. Motion editing with spacetime constraints. In SI3D ’97: Proceed-

ings of the 1997 symposium on Interactive 3D graphics, pages 139–ff., New York,

NY, USA, 1997. ACM Press.

[27] Michael Gleicher. Retargetting motion to new characters. In SIGGRAPH ’98:

Proceedings of the 25th annual conference on Computer graphics and interactive

techniques, pages 33–42, New York, NY, USA, 1998. ACM Press.

[28] F. Sebastin Grassia. Practical parameterization of rotations using the exponential

map. J. Graph. Tools, 3(3):29–48, 1998.

[29] Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popović.

Style-based inverse kinematics. ACM Trans. Graph., 23(3):522–531, 2004.

[30] Lorna Herda, Raquel Urtasun, Pascal Fua, and Andrew Hanson. Automatic deter-

mination of shoulder joint limits using quaternion field boundaries. I. J. Robotic

Res., 22(6):419–438, 2003.

[31] Jessica K. Hodgins, James F. O’Brien, and Jack Tumblin. Perception of human

motion with different geometric models. IEEE Transactions on Visualization and

Computer Graphics, 4(4):307–316, 1998.

[32] Jessica K. Hodgins, Wayne L. Wooten, David C. Brogan, and James F. O’Brien.

Animating human athletics. In SIGGRAPH ’95: Proceedings of the 22nd annual

Bibliography 97

conference on Computer graphics and interactive techniques, pages 71–78, New York,


[33] Michael G. Hollars, Dan E. Rosenthal, and Michael A. Sherman. SD Fast User’s

Manual, 1994.

[34] A. Hyvarinen and E. Oja. Independent component analysis: algorithms and appli-

cations. Neural Netw., 13(4-5):411–430, 2000.

[35] T. Igarashi, T. Moscovich, and J. F. Hughes. Spatial keyframing for

performance-driven animation. In SCA ’05: Proceedings of the 2005 ACM SIG-

GRAPH/Eurographics symposium on Computer animation, pages 107–115, New

York, NY, USA, 2005. ACM Press.

[36] Charles E. Jacobs, Adam Finkelstein, and David H. Salesin. Fast multiresolution

image querying. In SIGGRAPH ’95: Proceedings of the 22nd annual conference on

Computer graphics and interactive techniques, pages 277–286, New York, NY, USA,

1995. ACM Press.

[37] Michael Patrick Johnson. Exploiting Quaternions to Support Expressive Interactive

Character Motion. PhD thesis, Massachusettes Institute of Technology, 2003.

[38] Kanav Kahol, Priyamvada Tripathi, and Sethuraman Panchanathan. Automated

gesture segmentation from dance sequences. In Sixth IEEE International Conference

on Automatic Face and Gesture Recognition, pages 883–888, 2004.

[39] Ladislav Kavan and Jiri Zara. Spherical blend skinning: a real-time deformation of

articulated models. In SI3D ’05: Proceedings of the 2005 Symposium on Interactive

3D Graphics and Games, pages 9–16, New York, NY, USA, 2005. ACM Press.

[40] Eamonn Keogh, Themis Palpanas, Victor Zordan, Dimitrios Gunopulos, and Marc

Cardle. Indexing large human-motion databases. In VLDB 2004, 2004.

Bibliography 98

[41] Eamonn J. Keogh and Michael J. Pazzani. Scaling up dynamic time warping to

massive dataset. In PKDD ’99: Proceedings of the Third European Conference on

Principles of Data Mining and Knowledge Discovery, pages 1–11, London, UK, 1999.

Springer-Verlag.

[42] Evangelos Kokkevis, Dimitri Metaxas, and Norman I. Badler. User-controlled

physics-based animation for articulated figures. In CA ’96: Proceedings of the Com-

puter Animation, page 16, Washington, DC, USA, 1996. IEEE Computer Society.

[43] Lucas Kovar and Michael Gleicher. Flexible automatic motion blending with regis-

tration curves. In SCA ’03: Proceedings of the 2003 ACM SIGGRAPH/Eurographics

Symposium on Computer animation, pages 214–224, Aire-la-Ville, Switzerland,

Switzerland, 2003. Eurographics Association.

[44] Lucas Kovar and Michael Gleicher. Automated extraction and parameterization of

motions in large data sets. ACM Trans. Graph., 23(3):559–568, 2004.

[45] Lucas Kovar, Michael Gleicher, and Frédéric Pighin. Motion graphs. In

SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics

and interactive techniques, pages 473–482, New York, NY, USA, 2002. ACM Press.

[46] Richard Kulpa, Franck Multon, and Bruno Arnaldi. Morphology-independent rep-

resentation of motions for interactive human-like animation. In Eurographics 2005,

August 2005.

[47] Joe Laszlo, Michael Neff, and Karan Singh. Predictive feedback for interactive

control of physics-based characters. In Eurographics 2005, August 2005.

[48] Joseph Laszlo, Michiel van de Panne, and Eugene Fiume. Interactive control for

physically-based animation. In SIGGRAPH ’00: Proceedings of the 27th Annual

Conference on Computer Graphics and Interactive Techniques, pages 201–208, New

York, NY, USA, 2000. ACM Press/Addison-Wesley Publishing Co.

Bibliography 99

[49] Yan Li, Tianshu Wang, and Heung-Yeung Shum. Motion texture: a two-level sta-

tistical model for character motion synthesis. In SIGGRAPH ’02: Proceedings of

the 29th annual conference on Computer graphics and interactive techniques, pages

465–472. ACM Press, 2002.

[50] Jos M Martnez, Rob Koenen, and Fernando Pereira. Mpeg-7: the generic multimedia

content description standard. IEEE Computer Society, pages 78–87, 2002.

[51] Scott McCloud. Understanding Comics. Perennial Currents, 1994.

[52] David M Mount. ANN Programming Manual, 2005.

[53] J.C Nebel. Keyframe animation of articulated figures using autocollision-free inter-

polation. In 17th Eurographics UK Conference’99, Cambridge, UK, April 1999.

[54] Michael Neff and Eugene Fiume. Modeling tension and relaxation for computer

animation. In SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics

symposium on Computer animation, pages 81–88, New York, NY, USA, 2002. ACM

Press.

[55] Michael Neff and Eugene Fiume. Methods for exploring expressive stance. In SCA

’04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Com-

puter animation, pages 49–58, New York, NY, USA, 2004. ACM Press.

[56] Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications

in speech recognition. Readings in speech recognition, pages 267–296, 1990.

[57] Stephane Redon, Young J. Kim, Ming C. Lin, and Dinesh Manocha. Fast continuous

collision detection for articulated models. In Proceedings of ACM Symposium on

Solid Modeling and Applications, 2004.

Bibliography 100

[58] Charles Rose, Michael F. Cohen, and Bobby Bodenheimer. Verbs and adverbs:

Multidimensional motion interpolation. IEEE Comput. Graph. Appl., 18(5):32–40,

1998.

[59] Arno Schodl, Richard Szeliski, David H. Salesin, and Irfan Essa. Video textures. In

Kurt Akeley, editor, Siggraph 2000, Computer Graphics Proceedings, pages 489–498.

ACM Press / ACM SIGGRAPH / Addison Wesley Longman, 2000.

[60] Ari Shapiro, Fred Pighin, and Petros Faloutsos. Hybrid control for interactive char-

acter animation. In PG ’03: Proceedings of the 11th Pacific Conference on Computer

Graphics and Applications, page 455, Washington, DC, USA, 2003. IEEE Computer

Society.

[61] Hyun Joon Shin, Jehee Lee, Sung Yong Shin, and Michael Gleicher. Computer

puppetry: An importance-based approach. ACM Trans. Graph., 20(2):67–94, 2001.

[62] Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine. Boost Graph Library, The:

User Guide and Reference Manual, 2002.

[63] Danijel Skocaj and Ales Leonardis. Weighted incremental subspace learning. In

Workshop on Cognitive Vision, proceedings, Zurich, Switzerland, September 19-20

2002.

[64] T. Starner and A. Pentland. Visual recognition of american sign language using

hidden markov models. In International Workshop on Automatic Face and Gesture

Recognition, pages 189–194, 1995.

[65] Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for computer

graphics: A primer, part 1. IEEE Comput. Graph. Appl., 15(3):76–84, 1995.

[66] Harold C. Sun and Dimitris N. Metaxas. Automating gait generation. In SIG-

GRAPH, pages 261–270, 2001.

Bibliography 101

[67] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework

for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.

[68] Atulya Velivelli, ChengXiang Zhai, and Thomas S. Huang. Audio segment retrieval

using a short duration example query. In ICME, pages 1603–1606, 2004.

[69] M. Vlachos, G. Kollios, and D. Gunopulos. Discovering similar multidimensional

trajectories. In In Proc. of 18th ICDE, San Jose, p. 673684, CA, 2002., 2002.

[70] Andrew D. Wilson and Aaron F. Bobick. Realtime online adaptive gesture recogni-

tion. In ICPR, pages 1270–1275, 2000.

[71] Po-Feng Yang, Joe Laszlo, and Karan Singh. Layered dynamic control for in-

teractive character swimming. In SCA ’04: Proceedings of the 2004 ACM SIG-

GRAPH/Eurographics symposium on Computer animation, pages 39–47, New York,


[72] Atsuo Yoshitaka and Tadao Ichikawa. A survey on content-based retrieval for multi-

media databases. IEEE Transactions on Knowledge and Data Engineering, 11(1):81–

93, 1999.

[73] Victor Brian Zordan and Jessica K. Hodgins. Motion capture-driven simulations that

hit and react. In SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics

symposium on Computer animation, pages 89–96, New York, NY, USA, 2002. ACM

Press.

Motion Curves: A versatile representation for … › dcs › theses › MSc › 2005-06 ›...

Documents

Transcript of Motion Curves: A versatile representation for … › dcs › theses › MSc › 2005-06 ›...