A Survey - Human Movement Tracking and Stroke Rehabilitation

A Survey - Human Movement Tracking and StrokeRehabilitation

TECHNICAL REPORT: CSM-420ISSN 1744 - 8050

Huiyu Zhou and Huosheng Hu

8 December 2004

Department of Computer SciencesUniversity of Essex

United Kingdom

Email: [email protected], [email protected]

1

Contents

1 Introduction 3

2 Sensor technologies 42.1 Non-vision based tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Vision based tracking with markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Vision based tracking without markers . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Robot assisted tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Human movement tracking: non-vision based systems 63.1 MT9 based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 G-link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.3 MotionStar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4 InterSense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.5 Polhemus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.5.1 LIBERTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5.2 FASTRAK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5.3 PATRIOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.6 HASDMS-I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.7 Glove-based analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.8 Non-commercial systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.9 Other techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Vision based tracking systems with markers 114.1 Qualisys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 VICON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 CODA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 ReActor2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.5 ELITE Biomech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.6 APAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.7 Polaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.8 others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Vision based tracking systems without markers 155.1 2-D approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.1.1 2-D approches with explicit shape models . . . . . . . . . . . . . . . . . . . . . 165.1.2 2-D approaches without explicit shape models . . . . . . . . . . . . . . . . . . 17

5.2 3-D approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.1 3-D modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.2 Stick figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.3 Volumetric modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.3 Camera configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.3.1 Single camera tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.3.2 Multiple camera tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4 Segmentation of human motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1

6 Robot-guided tracking systems 236.1 Discriminating static and dynamic activities . . . . . . . . . . . . . . . . . . . . . . . . 236.2 Typical working systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.1 Cozens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.2.2 MANUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.2.3 Taylor and improved systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2.4 MIME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2.5 ARM-Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.3 Other relevant techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7 Discussion 267.1 Remaining challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267.2 Design specification for a proposed system . . . . . . . . . . . . . . . . . . . . . . . . . 27

8 Conclusions 27

2

Abstract

This technical report reviews recent progressin human movement tracking systems in general,and patient rehabilitation in particular. Majorachievements in previous working systems aresummarized. Meanwhile, problems in motiontracking that remain open are highlighted alongwith possible solutions. Finally, discussion ismade regarding challenges which remain and adesign specification is proposed for a potentialtracking system.

1 Introduction

Evidence shows that, in 2001-02, 130,000 peo-ple in the UK experienced a stroke [62] and re-quired admission to hospital. More than 75% ofthese people were elderly, who required locallybased multi-disciplinary assessments and appro-priate rehabilitative treatments after they weredismissed from hospital [35], [48]. As a con-sequence, this increased greatly the demand onhealthcare services, and expense in the nationalhealth service. To enhance the health service,people intend to use intelligently devised equip-ment to conduct patient rehabilitation in the pa-tient’s home rather than in a hospital that maybe geographically remote. These systems are ex-pected to reduce the requirement for face-to-facetherapy between therapy experts providing visionand audio supports, and patients.

The goal of rehabilitation is to enable a personwho has experienced a stroke to regain the high-est possible level of independence so that they canbe as productive as possible. Since stroke patientsoften have complex rehabilitation needs, progressand recovery characteristics are unique for eachperson. Although a majority of functional abili-ties may be restored soon after a stroke, recoveryis an ongoing process. Therefore, home-based re-habilitation systems are expected to have adaptivesettings designed to meet the requirements of in-dividuals, automatic operation, an open human-machine interface, rich database for later evalu-ation, and compactness and portability. In fact,

Figure 1. A rehabilitation system at theMassachusetts Institute of Technology (MIT),USA.

rehabilitation is a dynamic process which usesavailable facilities to correct any undesired mo-tion behavior in order to reach an expectation (e.g.ideal position).

During the rehabilitation process, the move-ment of stroke patients needs to be localized andlearned so that incorrect movements can be in-stantly modified or tuned. Therefore, trackingthese movements becomes vital and necessaryduring the course of rehabilitation. This reportdetails a survey of technologies deployed by hu-man movement tracking systems that consistentlyupdate the spatiotemporal information of patients.Previous systems (one of them shown in Figure1) have proved that, to some extent, properly con-ducted designs are capable of improving the qual-ity of human movement, but many challengesstill remain due to complexity and uncertainty inmovement. In the following sections, a compre-hensive review of this type of systems is provided.

The rest of this report is organized as follows.Section 2 outlines the four main types of tech-nologies used in human movement tracking. Sec-tion 3 presents non-vision based human move-ment tracking systems, which have been commer-cialized. Marker-based visual tracking systemsare introduced in Section 4, and markerless visualsystems described in Section 5. Section 6 pro-vides robot-guided tracking system concepts anda description of their application in the rehabili-tation procedure. A research proposal based onprevious work at the University of Essex, and lit-

3

Figure 2. Illustration of a real human movement tracking system (courtesy of Axel Mulder, SimonFraser University).

erature is provied in Section 8. Finally, conclu-sions are provided in Section 9.

2 Sensor technologies

Human movement tracking systems generatereal-time data that represents measured humanmovement [80], based on different sensor tech-nologies. For example, Figure 2 illustrates a hy-brid human movement tracking system. Retriev-ing such sensing information allows a system toefficiently describe human movement, e.g. armmotion. However, it is recognized that sensordata encoded with noise or error due to relativemovement between the sensor and the objects towhich it is attached. It is therefore essential to un-derstand the structure and characteristics of sen-sors before they are applied to a tracking sys-tem. According to sensor location on a humanbody, tracking systems can be classified as non-vision based, vision based with markers, visionbased without markers, and robot assisted sys-tems. These systems are described one at a timein the following sections.

2.1 Non-vision based tracking

In non-vision based systems, sensors are at-tached to the human body to collect movementinformation. Their sensors are commonly classi-fied as mechanical, inertia, acoustic, radio or mi-crowave and magnetic sensing. Some of themhave a small sensing footprint that they canmonitor small amplitudes such as finger or toemovement. Each kind of sensor has advantagesand limitations. Limitations include modality-specific, measurement-specific and circumstance-specific limitations that accordingly affect the useof the sensor in different environments [108].

For example, as part of inertia sensors ac-celerometer sensors (Figure 3) convert linear ac-celeration, angular acceleration or a combinationof both into an output signal [31]. There are threecommon types of accelerometers: piezoelectricwhich exploit the piezoelectric effect whereby anaturally occurring quartz crystal is used to pro-duce an electric charge between two terminals;piezoresistive operating by measuring the resis-tance of a fine wire when it is mechanically de-formed by a proof mass [71]; and variable capac-itive where the change in capacitance is propor-tional to acceleration or deceleration [110]. An

4

Figure 3. Illustration of a piezoresistive sen-sor.

example of accelerometers is given in Figure 4.Unfortunately, these sensors demand some com-puting power, which possibly increases responselatency. Furthermore, resolution and signal band-with are normally limited by the interface cir-cuitry [28].

2.2 Vision based tracking with markers

This is a technique that uses optical sensors,e.g. cameras, to track human movements, whichare captured by placing identifiers upon the hu-man body. As human skeleton is a highly ar-ticulated structure, twists and rotations make themovement fully three-dimensional. As a conse-quence, each body part continuously moves inand out of occlusion from the view of the cam-eras, leading to inconsistent and unreliable track-ing of the human body. As a good solution tothis situation, marker-based vision systems haveattracted the attention of researchers in medicalscience, sports science and engineering.

One major drawback of using optical sensorsand markers, however, is that they are difficult touse to accurately sense joint rotation, leading tothe infeasibility of representing a real 3-D modelfor the sensed objects [102].

Figure 4. Entran’s family of miniature ac-celerometers.

2.3 Vision based tracking without markers

This technique exploits external sensors likecameras to track the movement of the humanbody. It is motivated by facts addressed in markerbased vision systems [1]: (1) Identification ofstandard bony landmarks can be unreliable. (2)The soft tissue overlying bony landmarks canmove, giving rise to noisy data. (3) The markeritself can wobble due to its own inertia. (4) Mark-ers can even come adrift completely.

A camera can be of a resolution of a millionpixels. This is one of the main reasons that suchan optical sensor s attracted people’s attention.However, such vision based techniques require in-tensive computational power to achieve efficientlyand to reduce the latency of data [32]. Moreover,high speed camera’s are also required, as conven-tionally less than sixty frames a second providesan insufficient bandwith for accurate data repre-sentation [24].

2.4 Robot assisted tracking

Recently, voluntary repetitive exercises admin-istered with the mechanical assistance of roboticrehabilitators has proven effective in improvingarm movement ability in post-stroke populations.During the course of rehabilitation, human move-ment is reflected by using sensors attached tothe body, which consist of electromechanical andelectromagnetic sensors. Electromechanical sys-tems prohibit free movements and involve discon-necting sensors from the human body. The elec-

5

tromagnetic approach provides more freedom forhuman movement, but is seriously affected by di-rectional sensors.

3 Human movement tracking: non-vision based systems

Understanding and interpreting human behav-ior has attracted attention of therapists and bio-metric researchers due to its impact on the re-covery of patient post disease. So, people needto learn dynamic characteristics about the actionsof certain parts of the body, e.g. hand-gesturesand gait analysis. Tracking actions is an effec-tive means that consistently and reliably repre-sents human dynamics against time. This purposecan be reached through the use of electromechan-ical or electromagnetic sensors. This is so-called“non-vision based tracking”. Among the sensorsand systems to be introduced below, MT9 based,G-link based and MotionStar systems have wire-less properties, indicating that they are not limitedin space.

3.1 MT9 based

The MT9 [64] of Xsens Motion Tech, is a dig-ital measurement unit that measures 3-D rate-of-turn, acceleration and earth-magnetic field, as re-ferred to in Figure 5. Combined with the MT9Software it provides real-time 3-D orientationdata in the form of Euler angles and Quaternions,at frequencies up to 512 Hz and with an accuracybetter than 1 degree root-mean-square (RMS).

The algorithm of the MT9 system is equiva-lent to a sensor fusion process where the measuresof gravity through accelerometers and magneticnorth via magnetometers compensate for increas-ing errors from the integration of the rate of turndata. Hence, this drift compensation is attitudeand heading referenced. In a homogeneous earth-magnetic field, the MT9 system has 0.05 degreesRMS angular resolution; < 1.0 degrees static ac-curacy; and 3 degrees RMS dynamic accuracy.

Due to its compact size and reliable perfor-mance, the MT9 has easily been integrated intothe field of biomechanics, robotics, animation,

Figure 5. Illustration of MT9.

and virtual reality, etc. However, a MT9-basedtracker with six MT9 units costs about 16,000 eu-ros.

3.2 G-link

G-Link of MicroStrain is a high speed, triax-ial accelerometer node, designed to operate aspart of an integrated wireless sensor network sys-tem [2], as shown in Figure 6. The Base Sta-tion transceiver may trigger data logging (from30 meters), or request previously logged data tobe transmitted to the host PC for data acquisi-tion/display/analysis. Featuring 2 KHz sweeprates, combined with 2 Mbytes flash memory,these little nodes pack a lot of power in a smallpackage. Every node in the wireless networkis assigned a unique 16 bit address, so a singlehost transceiver can address thousands of multi-channel sensor nodes. The Base Station can trig-ger all the nodes simultaneously, and timing datais sent by the Base Station along with the trigger.This timing data is logged by the sensor nodesalong with sensor data.

G-Link may also be wirelessly commanded totransmit data continually, at 1 KHz sweep rates,for a pre-programmed time period. The contin-uous, fast wireless transmission mode allows forreal time data acquisition and display from a sin-gle multichannel sensor node at a time. G-linkhas two acceleration ranges: +/- 2 G’s and +/- 10G’s, whilst its battery lifespan can be 273 hours.Furthermore, this product has a small transceiversize: 25×25×5 mm

2. A G-Link starter kit, inl-cuding two G-Link data-logging transcevers (+/-10 G full scale range), one basestation, all nec-

6

Figure 6. A G-link unit.

essary software and cables, costs about 2,000 USdollars.

As a wireless sensor, 3DM-G combines an-gular rate gyros with three orthogonal DC ac-celerometers, three orthogonal magnetometers tooutput its orientation. This product can be oper-ated over the full 360 degrees of angular motionon all three axes with +/- 300 degrees/sec angularvelocity range, 0.1 degrees repeatability and +/-5 degrees accuracy. A gyro enhanced 3-axis ori-entation system starter kit 3DM-G-485-SK, con-sisting of one 3DM-G-485-M orientation module,one 3DM-G-485-CBL-PWR communication ca-ble and power supply, a 3MG-G Software Suitefor Win 95/98/2000/XP and a user manual, costs1,500 US dollars (approx.).

3.3 MotionStar

MotionStar is a magnetic motion capture sys-tem produced by the Ascension Technology Cor-poration in the USA. This system applies DCmagnetic tracking technologies, which are signif-icantly less susceptible to metallic distortion thanAC electronmagnetic tracking technologies. Itprovides real-time data output, capturing signif-icant amounts of motion data in short order. Re-gardless of the number of sensors tracked, one canget up to 120 measurements per sensor per sec-ond. This system achieves six degree-of-freedommeasurements, where each sensor calculates bothposition (x, y, z) and orientation (azimuth, eleva-tion, roll) for a full 360 degrees coverage without

Figure 7. Motionstar Wireless 2.

Figure 8. InterSense IS-300 Pro.

the “line of sight” blocking problems of opticalsystems. There are 6 data points sampled by eachsensor so fewer sensors are demanded. The com-munication between the console and the sensorsis wireless.

MotionStar Wireless 2 (Figure 7) is a magnetictracker for capturing the motion of one or moreperformers. Data is sent via a wireless communi-cations link to a base-station. It holds such goodperformance as: (1) translation range: +/- 3.05 m;(2) angular range: all attitude - +/- 180 deg for Az-imuth and Roll, +/- 90 deg for Elevation; (3) staticresolution (position): 0.08 cm at 1.52 m range; (4)static resolution (orientation): 0.1 RMS at 1.52 mrange. Unfortunately, the communication range isonly 12 feet (radius).

A vital drawback is that this system with sixsensors costs around 56,000 US dollars.

3.4 InterSense

InterSense has its updated product IS-300 ProPrecision Motion Tracker shown in Figure 8. Thissystem virtually eliminated the jitter common to

7

other systems. It is featured with update rates ofup to 500 Hz, steady response in metal-clutteredenvironments. The signal processor was smallenough to wear on a belt for tetherless applica-tion. Furthermore, this system was the only onewhich predicted motion up to 50 ms and compen-sated for graphics rendering delays and furthercontributed to eliminating simulator lag. There-fore, it has been used successfully to implementfeed-forward motion prediction strategies.

3.5 Polhemus

Polhemus [3] is the number one global providerof 3-D position/orientation tracking systems, dig-itizing technology solutions, eye-tracking sys-tems and handheld three-dimensional scanners.The company was founded in 1969 by Bill Pol-hemus in Grand Rapids, MI. In early 1971 Polhe-mus moved to the Burlington area. Polhemus pro-vided several novel fast and easy digital trackingsystems: LIBERTY, FASTRAK and PATRIOT.

3.5.1 LIBERTY

This was the forerunner in electromagnetic track-ing technology (Figure 9). LIBERTY computedat an extraordinary rate of 240 updates per secondper sensor with the ability to be upgraded fromfour sensor channels to eight, by the addition ofa single circuit board. Also, it had a latency of3.5 milliseconds, a resolution of .00015 in (0.038mm) at 12 in. (30 cm) range; and a 0.0012 ori-entation. The system provided an easy, intuitiveuser interface. Application uses were boundless,from biomechanical, and sports analysis, to vir-tual reality.

3.5.2 FASTRAK

FASTRAK was a solution for accurately comput-ing position and orientation through space (Figure10). With real time, six-degree-of-freedom track-ing and virtually no latency, this award-winningsystem was ideal for head, hand, and instrumenttracking, as well as biomedical motion and limbrotation, graphic and cursor control, stereotaxic

Figure 9. Illustration of LIBERTY by Polhe-mus.

Figure 10. Illustration of FASTRAK by Polhe-mus.

localization, telerobotics, digitizing, and pointing.

3.5.3 PATRIOT

PATRIOT was a cost effective solution for six-degree-of-freedom tracking and 3-D digitizing.A good answer for the position/orientation sens-ing requirements of 3-D applications and environ-ments where cost is a primary concern, it wasideal for head tracking, biomechanical analysis,computing graphics, cursor control, and stero-taxic localization. See Figure 11.

3.6 HASDMS-I

Human Performance Measurement, Inc. pro-vided the HASDMS-I Human Activity State De-tection and Monitoring System [4]. The ModelHASDMS-I is a system designed to detect and log

8

Figure 11. Illustration of PATRIOT by Polhe-mus.

selected human activity states over prolonged pe-riods (up to 7 days). It consists of a Sensing andLogging Unit (SLU) and Windows-based HostSoftware that runs on a user supplied PC. The sys-tem is based on the observation that while humansengage in activities which are often quite com-plex dynamically and kinematically, there are dis-tinct patterns that lead us to identify these activ-ities with specific words such as standing, walk-ing, etc. Such words are referred to as “activitystates”.

The HASDMS-I was designed to provide thegreatest activity discrimination with the smallestpossible sensor array (i.e., one sensing site onthe body). The SLU is a compact, battery pow-ered instrument with special sensors and a mi-croprocessor that is mounted to the lateral aspectof the monitored subject’s thigh. It detects fourunique activity states: (1) lying-sitting (grouped),(2) standing, (3) walking, and (4) running. A fifthstate (”unknown”) is also provided to discrimi-nate unusual patterns from those which the sys-tem is designed to detect.

The SLU is first connected to a Host PC (viaa simple serial port connection) for initializationand start-up. It is then attached to a subject for anunsupervised monitoring session. When the ses-sion is complete, the SLU is again connected tothe Host PC and logged data is uploaded to thehost software for databasing, display, and analy-sis. Several standard activity summaries are pro-vided including (1) the percent time spent in dif-ferent states and (2) the total amount of time

Figure 12. HASDMS-I from Human Perfor-mance Measurement, Inc.

Figure 13. Illustration of a glove-based proto-type (image courtesy of KITTY TECH).

(hours, minutes, seconds) spent in different states.In addition, an Activity State History Graph de-picts the type and duration of each state in a time-sequenced, scrollable window. Activity state datacan be printed or exported in the form of an ASCIItext file for any other user-specified analyses. TheHASDMS-1 system is shown in Figure 12.

3.7 Glove-based analysis

Since the late 1970s people have studied glove-based devices for the analysis of hand gestures.Glove-based devices adopt sensors attached to aglove that transduces finger flexion and abductioninto eletrical signals to determine the hand pose(Figure 13).

The Dataglove (originally developed by VPLResearch) was a neoprene fabric glove with twofiber optic loops on each finger. Each loop wasdedicated to one knuckle and this can be a prob-lem. If a user has extra large or small hands, the

9

loops will not correspond very well to the actualknuckle position and the user will not be able toproduce very accurate gestures. At one end ofeach loop is an LED and at the other end is aphotosensor. The fiber optic cable has small cutsalong its length. When the user bends a finger,light escapes from the fiber optic cable throughthese cuts. The amount of light reaching the pho-tosensor is measured and converted into a mea-sure of how much the finger is bent. The Data-glove requires recalibration for each user [117].

The CyberGlove system included one Cyber-Glove [5], an instrumentation unit, a serial cableto connect to your host computer, and an exe-cutable version of the VirtualHand graphic handmodel display and calibration software. Manyapplications require measurement of the positionand orientation of the forearm in space. To ac-complish this, mounting provisions for Polhemusand Ascension 6 degrees of freedom tracking sen-sors are available for the glove wristband. Track-ing sensors are not included in the basic Cyber-Glove system. The CyberGlove had a softwareprogrammable switch and LED on the wristbandto permit the system software developer to pro-vide the CyberGlove wearer with additional in-put/output capability. The instrumentation unitprovided a variety of convenient functions andfeatures including time-stamp, CyberGlove sta-tus, external sampling synchronization and analogsensor outputs.

Based on the design of the DataGlove, Power-Glove was developed by Abrams-Gentile Enter-tainment (AGE Inc.) for Mattel through a licens-ing agreement with VPL Research. PowerGloveconsists of a sturdy Lycra glove with flat plas-tic strain gauge fibers coated with conductive inkrunning up each finger; which measure change inresistance during bending to measure the degreeof flex for the finger as a whole. It employs anultrasonic system (back of glove) to track the rollof the hand (reported in one of twelve possibleroll positions), ultrasonic transmitters must be ori-ented toward the microphones to get an accuratereading; pitching or yawing hand changes orien-tation of transmitters and signal would be lost bythe microphones; poor tracking mechanism. (4D

- x, y, z, roll).Similar technologies can be referred to 5DT

DataGlove [6], PINCH Gloves [7], and HandMaster [8].

3.8 Non-commercial systems

The commercial systems described earlier ac-commodate stable and consistent technologies.Nevertheless, they are sold with high prices. Thisextremely limits the applications of these systemsin the community. As a result, people intend topropose some affordable, compact and friendlysystems instead. In this context, an example isgiven as follows.

Dukes [45] developed a compact system whichcomprised two parts, an embedded hand unit thatencapsulated the necessary hardware for captur-ing human arm movement and a software inter-face implemented in a computer terminal for dis-playing the collected data.

Within the embedded hand unit a microcon-troller gathered data from two accelerometers.The collected data was then transmitted to thecomputer terminal for the purpose of analysis.The software interface was implemented to col-lect data from the embedded hand unit. The datawas presented to the user both statically and dy-namically in the form of a three dimensional an-imation operation. The whole system success-fully captured human movement. Moreover, thetransference of data from the hand unit to the ter-minal was consistently achieved in an asynchro-nized mode. In the computer terminal, the col-lected data was clearly illustrated for represent-ing the continuous sampling of the FM transmit-ter and receiver modules, demonstrated in Figure14.

However, the animation shown in the termi-nal failed to correct display human movement interms of distance travelled and speed of move-ment. This is due to the direct output of the datafrom the accelerometers without any calculationwith respect to the distance. To perform a correctdemonstration, this data needs to be resampledand further processed in the terminal based on thetravelled distance and its corresponding time.

10

(a) (b)

Figure 14. Demo of Dukes’s approach: (a) collected data on x-axis, and (b) collected data on y-axis.

3.9 Other techniques

Acoustic systems collect information by trans-mitting and sensing sound waves, where the flightduration of a brief ultrasonic pulse is timed andcalculated. These systems are being used in med-ical applications, [46], [83], [91], but have notbeen used in motion tracking. This is due to in-herent drawbacks corresponding to the ultrasonicsystems: (1) the efficiency of an acoustic trans-ducer is proportional to the active surface area solarge devices are demanding; (2) to improve thedetected range the frequency of ultrasonic wavesmust be low (e.g. 10Hz) but this affects systemlatency in continuous measurement; (3) acousticsystems require a line of sight between the emit-ters and the receivers.

Radio and microwaves are normally used innavigation systems and airports landing aids[108] although they have no application in thehuman motion tracking. Electromagnetic wave-based tracking approaches can provide range in-formation by calculating the radiated energy dis-sipated in a form of radius r as 1/r2. For exam-ple, using a delay-locked loop (DL) the GlobalPositioning System (GPS) can achieve a resolu-tion of 1 meter. Obviously, this is not enoughfor the human motion that is usually of 40-50 cmdisplacements per sec. The only radio frequency-based precision motion tracker can be of a surpris-ingly good resolution of a few millimeters, but itused large racks of microwave equipment and wasdemonstrated in an empty room. That is to say, ahybrid system is potential to obtain higher resolu-tion but incurs integration difficulties.

4 Vision based tracking systems withmarkers

In 1973 Johansson explored his famous Mov-ing Light Display (MLD) psychological experi-ment to perceive biological motion [69]. He at-tached small reflective markers to the joints ofhuman subjects, which allow these markers tobe monitored during trajectories. This experi-ment became the milestone of human movementtracking. Although Johansson’s work establisheda solidate theory for human movement track-ing, it still faces the challenges such as errors,non-robustness and expensive computation due toenvironmental constraints, mutual occlusion andcomplicated processing. However, tracking sys-tems with markers minimize uncertainty of sub-ject movements due to the unique appearance ofthe markers. Consequently, plenty of marker-based tracking systems are nowadays available inthe market. Study of these systems allows theiradvantages to be exploited in a further developedplatform.

4.1 Qualisys

A Qualisys motion capture system depicted inFigure 15 consists of 1 to 16 cameras, each emit-ting a beam of infrared light [9]. Small reflectivemarkers are placed on the object or person to bemeasured. The camera flash infrared light and themarkers reflect it back to the camera. The cam-era then measures a 2-dimensional position of thereflective target by combining the 2-D data fromseveral cameras a 3D position is calculated. Thedata can be analyzed in Qualisys Motion Manager

11

Figure 15. An operating Qualisys system.

Figure 16. Reflective markers used in a real-time VICON system.

(QMM) or is exported in several external formats.

This system can be combined with Visual3D,an advanced analysis package for managing andreporting optical 3-D data, to track each segmentof the model. The pose (position and orientation)of each segment is determined by 3 or more non-collinear points attached to the segment. For bet-ter accuracy, a cluster of targets can be rigidly at-tached to a shell. This prevents the targets frommoving relative to each other. This shell is thenaffixed to the segment. The kinematics modelis calculated by determining the transformationfrom the tracking targets recorded to a calibrationpose.

4.2 VICON

VICON, a 3-D optical tracking system, wasspecifically designed for use in virtual and im-mersive environments [63]. By combining Viconshigh-speed, high-resolution cameras with new au-tomated Tracker software, the system delivers im-mediate and precision manipulation of graphicsfor first person immersive environments for mil-itary, automotive, and aerospace visualizations.Precise, low-latency and jitter free motion track-ing, though key to creating a realistic sense of im-mersion in visualizations and simulations, has notbeen possible previously due to lag, inaccuracies,unpredictability and unresponsiveness in electro-magnetic, inertial and ultrasonic tracking options.

The VICON Tracker, which offers wireless, ex-treme low-latency performance with six degreesof freedom (DOF) and zero environmental inter-ference, outclasses these obsolescent systems, yetis the simplest to set up and calibrate. Targets aretracked by proprietary CMOS VICON camerasranging in resolution from 640x480 to 1280x1024and operating between 200-1000 Hz. The en-tire range of cameras are designed, developed andbuilt specifically for motion tracking.

At the heart of the system, the VICON Trackersoftware automatically calculates the center of ev-ery marker, reconstructs its 3-D position, identi-fies each marker and object, and outputs 6 DOFinformation typically in less than 7 milliseconds.The strength of Vicon Tracker software lies in itsautomation. The very first Tracker installation re-quires about an hour of system set-up; each fol-lowing session requires only that the PC runningthe software be switched on. Objects with threeor more markers will automatically output mo-tion data that can be applied to 3-D objects in realtime, and integrated into a variety of immersive 3-D visualization applications, including EDS Jack,Dassault Delmia, VRCOM, Fakespace, VRCOTrack D and others. Figure 16 shows that reflec-tive markers within a real-time VICON systemare applied to two subjects.

12

Figure 17. CODA system.

4.3 CODA

CODA is an acronym of Cartesian Opto-electronic Dynamic Anthropometer, a name firstcoined in 1974 to give a working title to an earlyresearch instrument developed at LoughboroughUniversity, United Kingdom by David Mitchelsonand funded by the UK Science Research Council[10], illustrated in Figure 17.

The system was pre-calibrated for 3-D mea-surement, which means that the lightweight sen-sor can be set up at a new location in a matter ofminutes, without the need to recalibrate using aspace-frame. Up to six sensor units can be usedtogether and placed around a capture volume togive extra sets of eyes and maximum redundancyof viewpoint. This enables the Codamotion sys-tem to track 360 degree movements which oftenoccur in animation and sports applications. Theactive markers were always intrinsically identi-fied by virtue of their position in a time multi-plexed sequence. Confused or swapped trajecto-ries can never happen with the Codamotion sys-tem, no matter how many markers are used or howclose they are to each other.

The calculation of the 3-D coordinates of mark-ers was done in real-time with an extremely lowdelay of 5 milliseconds. Special versions of thesystem were available with latency shorter than1 millisecond. This opens up many applicationsthat require real-time feedback such as research

Figure 18. ReActor2 system.

in neuro-physiology and high quality virtual re-ality systems as well as tightly coupled real-timeanimation. It was also possible to trigger externalequipment using the real-time Codamotion data.At a three metre distance, this system has suchgood accurate parameters as follows: +/-1.5 mmin X and Z axes, +/- 2.5 mm in Y axis for peak-to-peak deviations from actual position.

4.4 ReActor2

As products of Ascension Tech. CorporationReActor2 digital active-optical tracking systemsshown in Figure 18 capture the movements of anuntethered performer C free to move in a cap-ture area bordered by modular bars that fastentogether. The digital detectors embedded in theframe provide full coverage of performers whileminimizing blocked markers. The Instant MarkerRecognition instantly reacquires blocked markersfor clean data. This means less post processingand a more efficient motion capture pipeline [11].

Up to 544 new and improved digital detec-tors embedded in a 12-bar frame and over 800active LEDs flashing per measurement cycle forcomplete tracking coverage. A sturdy, ruggedi-zed frame eliminiates repetitive camera calibra-tion and tripod alignment headaches. Set up thesystem once and start capturing data immediately.

4.5 ELITE Biomech

ELITE Biomech from BTS of Italy is basedon the latest generation of ELITE systems:

13

Figure 19. Demo of ELITE Biomech’s out-comes.

ELITE2002. ELITE2002 performs a highly accu-rate reconstruction of any type of movement, onthe basis of the principle of shape recognition ofpassive markers.

3D reconstruction and tracking of markersstarting from pre-defined models of protocols arewidely validated by the international scientificcommunity. Tracking of markers based on theprinciple of shape recognition allows the use ofthe system in extreme conditions of lighting. Thissystem is capable of managing up to 4 force plat-forms of various brands, and up to 32 electro-myographic channels. It also runs in real timerecognition of markers with on-monitor-displayduring the acquisition, and real time processing ofcinematic and analog data, demonstrated in Fig-ure 19.

4.6 APAS

The Ariel Performance Analysis System(APAS) [12] is the premier products designed,manufactured, and marketed, by Ariel Dynamics,Inc. It is an advanced video-based system op-erating from the Windows 95/98/NT/2000 envi-ronments. Specific points of interest are digitizedwith user intervention or automatically using con-trasting markers. Additionally, analog data (i.e.force platform, EMG, goniometers etc.) can becollected and synchronized with the kinematicdata. Although the system has primarily beenused for quantification of human activities, it has

Figure 20. The Polaris system.

also been utilized in many industrial, non-humanapplications. Optional software modules includereal-time 3D (6 degree of freedom) rendering ca-pabilities and full gait pattern analysis utilizing allindustry standard marker sets.

4.7 Polaris

The Polaris system (Northern Digital Inc.) [13]is of real-time tracking flexibility for comprehen-sive purposes, including academic and industrialenvironments. This system optimally combinessimultaneous tracking of both wired and wirelesstools (Figure 20).

The whole system can be divided into twoparts: the position sensors and passive or ac-tive markers. The former consist of a couple ofcameras that are only sensitive to infrared light.This design is particularly useful when the back-ground lighting is varying and unpredictive. Pas-sive markers are covered by reflective materials,which are activated by the arrays of infrared light-emitting diodes surrounding the position sensorlenses. In the meantime, active markers can emitinfrared light themselves. The Polaris system isable to provide 6 degrees of freedom motion in-formation. With proper calibration, this systemmay achieve 0.35 mm RMS accuracy in positionmeasures. A basic Polaris with a software devel-opment kit (SDK) costs about $2000.

However, similar to other marker-based tech-niques, the Polaris system cannot sort out the oc-clusion problem due to the existence of the lineof sight. Adding extra position sensors possiblymitigates the trouble but also increases computa-

14

(a) (b)

(c) (d)

Figure 21. Demo of Tao and Hu’s approach:(a) markers attached to the joints; (b), (c) and(d) marker points captured from three cam-eras.

tional cost and operational complexity.

4.8 others

Other commercial marker-based systems aregiven in [14], [15], [16].

By combining with the commercial marker-based systems, people have developed some hy-brid techniques to implement human motiontracking. These systems, although still in theexperimental stage, already demonstrate encour-aging performance. For example, Tao and Hu[103] built a visual tracking system, which ex-ploited both marker-based and marker-free track-ing methods. The proposed system consisted ofthree parts: a patient, video cameras and a PC.The patient’s motion was filmed by video cam-eras and the captured image sequences were in-put to the PC. The software in the PC com-prised of three modules: motion tracking mod-ule, database module and decision module. Themotion tracking module was formulated in ananalysis-by-synthesis framework, which was sim-ilar to the strategy introduced by O’Rourke andBadler [82]. In order to enhance the predic-tion component, a marker-based motion learn-ing method was adopted: small retro-reflective

ball markers were attached to the performer’sjoints, which reflected infrared light so that cam-eras picked up the bright points indicating thecameras’ positions. The 3-D position of eachmarker was calculated by corresponding a 2-Dmarker point in an image plane via the epipo-lar constraint. The skeleton motion of the per-former was then deduced [37]. By using the per-spective camera models, the 3-D model recoveredpreviously was projected into 2-D image planes,which behaved as the prediction of the matchingframework. The performance of this approach isdemonstrated in Figure 21.

5 Vision based tracking systemswithout markers

In the previous section, we described the fea-tures of the marker-based tracking systems, whichare restritive to some degree due to the mountedmarkers. As a less restritive motion capture tech-nique, markerless based systems are capable ofovercoming the mutual occlusion problem as theyare only concerned about boundaries or featureson human bodies. This is an active and promis-ing but also challenging research area in the lastdecade. The research with respect to this area isstill ongoing due to unsolved technical problems.

From a review’s point of view, Aggarwal andCai [17] classified human motion analysis as:body structure analysis (model and non-modelbased), camera configuration (single and multi-ple), and correlation platform (state-space andtemplate matching). Gavrila [51] claimed that thedimensionality of the tracking space, e.g. 2-D or3-D, be mainly focused. To be coincident withthese exiting definitions, we suggest to contain allthese issues in this context.

5.1 2-D approaches

As a commonly used framework, 2-D motiontracking only concerns the human movement inan image plane, although sometimes people in-tend to project a 3-D structure into its image planefor processing purposes. This approach can becatalogued with and without explicit shape mod-

15

(a) (b) (c)

Figure 22. Demonstration of Pfinder by Wren, et al.

els.

5.1.1 2-D approches with explicit shape mod-els

Due to the arbitrary movements of humans self-occlusion exists during human trajetories. Tosort out this problem, one normally uses a pri-ori knowledge about human movements in 2-D bysegmenting the human body. For example, Wrenet al. [109] presented a region-based approach,where they regarded the human body as a set of“blobs” which can be described by a spatial andcolor Gaussian distribution. To initialize the pro-cess, a foreground region can be extracted giventhe background model. The blobs, representinghuman hands, head, etc., are then placed over theforeground region instead. A 2-D contour shapeanalysis was undertaken to identify various bodyparts. The working flowchart is referred to Figure22.

Akita [18] explored an approach to segmentand track human body parts in common circum-stances. To prevent the body tracking from col-lapsing, he presumed the human movements areknown a priori in some kind of “key frames”. Hefollowed the tracking order, legs, head, arms, andtrunk, to detect the body parts. However, due tosimplification his model works in some specialsituations.

Long and Yang [75] advocated that the limbsof a human silhouette could be tracked based onthe shapes of the antiparallel lines. They alsoconducted experimental work to cope with oc-clusion, i.e. disappearance, merging and split-ting. Kurakake and Nevatta [74] attempted to

Figure 24. Computer game on-chip by Free-man, W. et al.

obtain joint locations in images of walking hu-mans by establishing correspondence between ex-tracted ribbons. Their work assumed small mo-tion between two consecutive frames, and featurecorrespondence was conducted using various ge-ometric constraints.

Shimada et al. [95] suggested to achieve rapidand precise estimation of human hand postures bycombining 2-D appearance and 3-D model-basedfitting. First, a rough posture estimate was ob-tained by image indexing. Each possible hand ap-pearance generated from a given 3-D shape modelwas labeled by an index obtained by PCA com-pression and registered with its 3-D model pa-rameters in advance. By retrieving the index ofthe input image, the method obatined the matchedappearance image and its 3-D parameters rapidly.Then, starting from the obtained rough estimate,it estimated the posture and moreover refined thegiven initial 3-D model by model-fitting.

16

Figure 23. Human tracking in the approach of Baumberg and Hogg.

5.1.2 2-D approaches without explicit shapemodels

This is a more often addressed topic. Since humanmovements are non-rigid and arbitrary, bound-aries or silhouettes of human body are viable anddeformable, leading to difficult description forthem. Tracking human body, e.g. hands, is nor-mally achieved by means of background substrac-tion or color detection. Furthermore, due to theunavailability of models one has to attend lowlevel image processing such as feature extraction.

Baumberg and Hogg [21] considered using Ac-tive Shape Model (ASM) for tracking pedestri-ans (Figure 23). B-splines were used to repre-sent different shapes. The foreground region wasfirst extracted by substracting the background. AKalman filter was then applied to accomplish thespatio-temporal operation, which is similar to thework of Blake et al [26]. Their work was then ex-tended by automatically generating an improvedphysically based model using a training set of ex-amples of the object deforming, tuning the elasticproperties of the object to reflect how the objectactually deforms. The resulting model providesa low dimensional shape description that allowsaccurate temporal extrapolation at low computa-tional cost based on the training motions [22].

Freeman et al. [49] developed a special de-tector for computer games on-chip (Figure 24),which is to infer useful information about the po-sition, size, orientation, or configuration of thehuman body parts. Two algorithms were used,one of which used image moments to calculatean equivalent rectangle for the current image, and

the other used orientation histograms to select thebody pose from a menu of templates.

Cordea et al. [39] discussed a 2.5 dimensionaltracking method allowing real-time recovery ofthe 3-D position and orientation of a head mov-ing in its image plane. This method used a 2-Delliptical head model, a region- and edge-basedmatching algorithms, and a Linear Kalman Filterestimator. The tracking system worked in a realis-tic situation without makeup on the face, with anuncalibrated camera, and unknown lighting con-ditions and background.

Fablet and Black [47] proposed a solutionfor the automatic detection and tracking of hu-man motion using 2-D optical flow information,which provided rich descriptive cues, while be-ing independent of object and background ap-pearance. To represent the optical flow patternsof people from arbitrary viewpoints, they devel-oped a novel representation of human motion us-ing low-dimensional spatio-temporal models thatwere learned using motion capture data of hu-man subjects. In addition to human motion (theforeground) they modelled the motion of genericscenes (the background); these statistical modelswere defined as Gibbsian fields specified from thefirst-order derivative of motion observations. De-tection and tracking were posed in a principledBayesian framework which involved the compu-tation of a posterior probability distribution overthe model parameters. A particle filter was thenused to represent and predict this non-Gaussianposterior distribution over time.The model param-eters of samples from this distribution were re-lated to the pose parameters of a 3-D articulated

17

model.

5.2 3-D approaches

These approaches attempted to recover 3-D ar-ticulated poses over time [51]. People usuallyproject a 3-D model into a 2-D image for substan-tial processing. This is due to the application ofimage appearance and dimensional reduction.

5.2.1 3-D modelling

Modelling human movements a priori allowsthe tracking problem to be minimized: the fu-ture movements of the human body can bepredicted regardless of self-occlusion or self-collision. O’Rourke and Badler [82] discoveredthat the prediction in state space seemed more sta-ble than that in image space due to the incorpo-rated semantic knowledge in the former. In theirtracking framework, four components were inl-cuded: prediction, synthesis, image analysis, andstate estimation. This strategy has been applied tomost of the existing tracking systems.

Model-based approaches contain stick figures,volumetric and a mixture of models.

5.2.2 Stick figure

The stick figure is the representation of the skele-tal structure, which is normally regarded as a col-lection of segments and joint angles (Figure 25).Bharatkumar et al [23] used stick figures to modelthe lower limbs, e.g. hip, knees, and ankles. Theyapplied a medial-axis transformation to extract 2-D stick figures of the lower limbs.

Chen and Lee [37] first applied geometric pro-jection theory to obtain a set of feasible pos-tures from a single image, then made use of thegiven dimensions of the human stick figure, phys-iological and motion-specific knowledge to con-strain the feasible postures in both the single-frame analysis and the multi-frame analysis. Fi-nally a unique gait interpretation was selected byan optimization algorithm.

Huber’s human model [65] was a refined ver-sion of the stick figure representation. Joints wereconnected by line segments with a certain degree

Figure 25. Stick figure of human body (imagecourtesy of Freeman, W.T.).

of constraint that could be relaxed using “virtualsprings”. This model behaved as a mass-spring-damper system. Proximity space (PS) was usedto confined the motion and stereo measurementsof joints, which started from the human head andextended to arms and torso through the expansionof PS.

By modelling a human body with 14 jointsand 15 body parts, Ronfard et al. [93] at-tempted to find people in static video frames usinglearned models of both the appearance of bodyparts (head, limbs, hands), and of the geome-try of their assemblies. They built on Forsythand Fleck’s general ‘body plan’ methodology andFelzenszwalb and Huttenlocher’s dynamic pro-gramming approach for efficiently assemblingcandidate parts into ‘pictorial structures’. How-ever they replaced the rather simple part detec-tors used in these works with dedicated detectorslearned for each body part using Support VectorMachines (SVMs) or Relevance Vector Machines(RVMs). RVMs are SVM-like classifiers that of-fer a well-founded probabilistic interpretation andimproved sparsity for reduced computation. Theirbenefits were demonstrated experimentally in aseries of results showing great promise for learn-ing detectors in more general situations.

Further technical reports are given in [50], [61],[67], [81], [86].

18

5.2.3 Volumetric modeling

Elliptical cylinders are one of the volumetricmodels that model human body. Hogg [60] andRohr [92] extended the work of Marr and Nishi-hara [78], which used elliptical cylinders for rep-resenting the human body. Each cylinder con-sisted of three parameters: the length of the axis,the major and minor axes of the ellipse cross sec-tion. The coordinate system originated from thecenter of the torso. The difference between thetwo approaches is that Rohr used eigenvector linefitting to project the 2-D image onto the 3-D hu-man model.

Rehg et al. [87] represented two occluded fin-gers using several cylinders, and the center axesof cylinders were projected into the center linesegments of 2-D finger images. Goncalves et al.[52] modelled both the upper and lower arm astruncated circular cones, and the shoulder and el-bow joints were presumably spherical joints. A3-D arm model was projected to an image planeand then fitted to the blurred image of a real arm.The maching was acheived by minimizing the er-ror between the model projection and the real im-age adapting the size and the orientation of themodel.

Chung and Ohnishi [38] proposed a 3-Dmodel-based motion analysis which used cue cir-cles (CC) and cue sphere (CS). Stereo match-ing for reconnstructing the body model was per-formed by finding pairs of CC between the pairof contour images investigated. A CS needed tobe projected back onto two image planes with itscorresponding CC.

Theobalt et al. [105] suggested to combine ef-ficient real-time optical feature tracking with thereconstruction of the volume of a moving sub-ject to fit a sophisticated humanoid skeleton tothe video footage. The scene is observed with 4video cameras, two connected to one PC (Athlon1GHz). The system consisted of two parts: adistributed tracking and visual hull reconstructionsystem (online component), and a skeleton fittingapplication that took recorded sequences as input.For each view, a moving person was separatedfrom background by a statistical background sub-

Figure 26. Volumetric modelling by Theobalt,C.

traction. In the initial frame, the silhouette of theperson seen from the 2 front view cameras wasseparated into distinct regions using a General-ized Voronoi Diagram Decomposition. The lo-cations of its hands, head and feet could now beidentified. In the front camera view for all videoframes after initialization the locations of thesebody parts could be tracked and their 3-D locationreconstructed. In addition a voxel-based approx-imation to the visual hull was computed for eachtime step. The experimental volumetric data wasgiven in Figure 26.

5.3 Camera configuration

The tracking problem can be tackled by propercamera setup. Literature has been linked with asingle camera and a distributed-camera configu-ration. Using multiple cameras does require acommon spatial reference to be employed, and asingle camera does not have such a requirement.However, a single camera from time to time suf-fers from the occlusion of the human body dueto its fixed viewing angle. Thus, a distributed-camera strategy is a better option of minimizingsuch a risk.

19

5.3.1 Single camera tracking

Polana and Nelson [84] observed that the move-ments of arms and legs converge to that of thetorso. Each walking person image was boundedby a rectangular box, and the centroid of thebounding box was treated as the feature to track.Positions of the center point in the previousframes were used to estimate the current position.As such, correct tracking was conducted when thetwo subjects were occluded to each other even inthe middle of the image sequences.

Sminchisescu and Triggs [98] present a methodfor recovering 3-D human body motion frommonocular video sequences using robust imagematching, joint limits and non-self-intersectionconstraints, and a new sample-and-refine searchstrategy guided by rescaled cost-function covari-ances. Monocular 3-D body tracking is challeng-ing: for reliable tracking at least 30 joint param-eters need to be estimated, subject to highly non-linear physical constraints; the problem is chron-ically ill-conditioned as about 1/3 of the d.o.f.(the depth-related ones) are almost unobservablein any given monocular image; and matching animperfect, highly flexible, self-occluding modelto cluttered image features is intrinsically hard.To reduce correspondence ambiguities they useda carefully designed robust matching-cost met-ric that combined robust optical flow, edge en-ergy, and motion boundaries. Even so, the am-biguity, nonlinearity and non-observability madethe parameter-space cost surface be multi-modal,unpredictable and ill-conditioned, so minimizingit is difficult. They discussed the limitations ofCONDENSATION-like samplers, and introduceda novel hybrid search algorithm that combinedinflated-covariance-scaled sampling and continu-ous optimization subject to physical constraints.Experiments on some challenging monocular se-quences showed that robust cost modelling, jointand self-intersection constraints, and informedsampling were all essential for reliable monocu-lar 3-D body tracking.

Bowden et al. [29] advocated a model basedapproach to human body tracking in which the2-D silhouette of a moving human and the cor-

responding 3-D skeletal structure were encap-sulated within a non-linear Point DistributionModel. This statistical model allowed a directmapping to be achieved between the externalboundary of a human and the anatomical position.It showed that this information, along with the po-sition of lanmark features, e.g. hands and head,could be used to reconstruct information aboutthe pose and structure of the human body froma monoscopic view of a scene.

Barron and Kakadiaris [20] present a simple,efficient, and robust method for recovering 3-Dhuman motion capture from an image sequenceobtained using an uncalibrated camera. The pro-posed algorithm included an anthropometry ini-tialization step, assuming that the similarity of ap-pearance of the subject over the time of acquisi-tion led to the minimum of a convex function onthe degree of freedom of a Virtual Human Model(VHM). The method searched for the best pose ineach image by minimizing discrepancies betweenthe image under consideration and a synthetic im-age of an appropriate VHM. By including on theobjective function penalty factors from the imagesegmentation step, the search focused on regionsthat belong to the subject. These penalty factorsconverted the objective function to a convex func-tion, which guaranteed that the minimization con-verged to a global minimum.

To reduce side-effects of hard kinematic con-straints, Dockstader et al. [44] proposed a newmodel-based approach toward three-dimensional(3-D) tracking and extraction of gait and hu-man motion. They suggested the use of a hi-erarchical, structural model of the human bodythat introduced the concept of soft kinematic con-straints. These constraints took the form of a pri-ori, stochastic distributions learned from previ-ous configurations of the body exhibited duringspecific activities; they were used to supplementan existing motion model limited by hard kine-matic constraints. Time-varying parameters of thestructural model were also used to measure gaitvelocity, stance width, stride length, stance times,and other gait variables with multiple degrees ofaccuracy and robustness. To characterize track-ing performance, a novel geometric model of ex-

20

Figure 27. Human motion tracking by Black,M.J. et al.

pected tracking failures was then introduced.Yeasin and Chaudhuri [113] proposed a simple,

inexpensive, portable and real-time image pro-cessing system for kinematic analysis of humangait. They viewed this as a feature based multi-target tracking problem. They tracked the arti-ficially induced features appearing in the imagesequence due to the non-impeding contrast mark-ers attached at different anatomical landmarks ofthe subject under analysis. The paper described areal-time algorithm for detecting and tracking fea-ture points simultaneously. By applying a Kalmanfilter, they recursively predicted tentative featureslocation and retained the predicted point in caseof occlusion. A path coherence score was used fordisambiguation along with tracking for establish-ing feature correspondences. Experimentationson normal and pathological subjects in differentgait was performed and results illustrated the ef-ficacy of the algorithm. Similar algorithms to thisone can be found in [115] and [114].

Further to the application of optical flow in themotion learning, Black et al. [25] proposed aframework for learning parameterized models ofoptical flow from image sequences. A class ofmotion is represented by a set of orthogonal ba-sis flow fields that were computed from a train-ing set using principal component analysis. Manycomplex image motion sequences can be repre-sented by a linear combination of a small number

of these basis flows. The leared motion modelsmay be used for optical flow estimation and formodel-based recognition. They described a ro-bust, multi-resolution scheme for directly com-puting the parameters of the learned flow mod-els from image derivatives. As examples they in-cluded learning motion discontinuities, non-rigidmotion of humans, and articulated human mo-tion. Later, Sidenbladh [96] et al., also in [97],extended the work of [25] to a generative prob-abilistic method for tracking 3-D articulated hu-man figures in monocular image sequences (seethe example shown in Figure 27). These ideassimilar to [66] that obtained further extension in[111], [112].

5.3.2 Multiple camera tracking

To enlarge the monitored area and to avoid thedisappearance of subjects, a distributed-camerastrategy is set up to solve the ambiguity ofmactching when subjects are occluded to eachother. Cai and Aggarwal [34] used multiple pointsbelonging to the medial axis of the human upperbody as the feature to track. These points weresparsely sampled and assumed to be independentof each other. Location and average intensity offeature points were integrated to find the mostlikely match between two neighboring frames.Multivariate Gaussian distributions were presum-ably addressed in the lcass-conditional probabil-ity density function of features of candidate sub-ject images. It was shown that using such a sys-tem with three cameras indoors led to real timeoperation.

Sato et al. [94] represented a moving personas a combination of blobs of its body parts. Allthe cameras were calibrated in advance regardingthe CAD model of an indoor environment. Blobswere corresponded using their area, brightness,and 3-D position in the world coordinates. The3-D position of a 2-D blob was estimated on thebasis of its height retrieved from the distance be-tween the weight center of the blob and the floor.

Ringer and Lasenby [90] proposed to use mark-ers placed at the joints of the arm(s) or leg(s) be-ing analyzed, referred to Figure 28. The location

21

Figure 28. Applications of multiple cameras in human motion tracking by Ringer and Lasenby.

of these markers on a camera’s image plane pro-vided the input to the tracking systems with theresult that the required parameters of the bodycould be estimated to far greater accuracy that onecould obtain in the markerless case. This schemeused a number of video cameras to obtain com-plete and accurate information on the 3-D loca-tion and motion of bodies over time. Based onthe extracted kinematics and measurement mod-els, the extended Kalman filter (EKF) and particlefilter tracking strategies were compared in theirapplications to update state estimates. The resultsjustified that the EKF was preferred due to its lesscomputational demands.

Rodor et al. [27] introduced a method foremploying image-based rendering to extend therange of use of human motion recognition sys-tems. Input views orthogonal to the directionof motion were created automatically to con-struct the proper view from a combination ofnon-orthogonal views taken from several cam-eras. Image-based rendering was utilized in twoways: (1) to generate additional training sets forthese systems containing a large number of non-orthogonal views, and (2) to genrate orthogo-nal views from a combination of non-orthogonalviews from several cameras.

Multiple cameras are needed to completelycover an environment for monitoring activity. Totrack people successfully in multiple perspec-tive imagery, one has to establish correspondencebetween objects captured in multiple cameras.Javed et al. [68] presented a system for track-

ing people in multiple uncalibrated cameras. Thesystem was able to discover spatial relationshipsbetween the camera field of views and uses thisinformation to correspond between different per-spective views of the same person. They exploredthe novel approach of finding limits of field ofview of a camera as visible in other cameras. Thishelped disambiguate between possible candidatesof correspondence.

5.4 Segmentation of human motion

Spatio-temporal segmentation, illustrated inFigure 29, is vital in vision related analysis dueto the required reconstruction of dynamic scenes.Spatial segmentation attempts to extract mov-ing objects from their backgorund, and dividea complicated motion stream into a set of sim-ple and stable motions [55]. In order to fullydepict human motion in constraint-free environ-ments, people have explored a variety of motionsegmentation strategies which consisted of bothmodel-based and appearence-based approaches[55], [116], [77], [54], [99]. Nevertheless, itdoes not mean that segmentation is independentlyachieved. Instead, motion segmentation is nor-mally encoded within the tracking procedure andperforms like an assistive tool and descriptor.

Gonzalez et al. [53] estimated motion flows offeatures on human body using a standard tracker.Given a pair of subsequent images, an affine fun-damental matrix was estimated by four pairs ofcorresponding feature points such that number of

22

Figure 29. Segmentation of human body byTheobalt, C.

other feature points undergoing the affine mo-tion modelled by the matrix should be maximized[118]. In the remaining subsequent frames, fea-ture points corresponding to those used for thefirst fundamental matrix continued to estimate anaffine motion model. At the last pair of frames,it led to a set of feature points identified as thosebelonging to a same limb and hence undergoinga same motion over the whole sequence. By re-peating this estimate-and-sortout process over theremaining feature points, different limb motionswere finally segmented.

To obtain a high level interpretation of humanmotion in a vedio stream one has to first detectbody parts. Hilti et al. [59] proposed to com-bine both pixel-based skin color segmentationand motion-based segmentation for human mo-tion tracking. The motivation of using skin colorwas raised due to its orientation invariant and fastdetection. The human face normally presents alarge skin surface in a flesh-tone, which is quitesimilar from person to person and even across var-ious races [100]. Using hue and saturation (HS) asinputs, a color map was changed to a filtered im-age, where each pixel is associated with a likeli-hood of being [85]. For compensating the impactsof lighting changes, the motion-based segmenta-tion was implemented and adaptive to exogenouschanges.

Bradski and Davies [30] present a fast and sim-ple method using a timed motion history image

(tMHI) for representing motion from gradients insuccessively layered silhouettes. The segmentedregions were not “motion blobs” but motion re-gions that were naturally connected to parts ofmoving objects. This movivated by the fact thatsegmentation by collecting “blobs” of similar di-rection motion frame to frame from optical flow[41] did not gurantee the correspondence of themotion over time. By labeling motion regionsconnected to the current silhouette using a down-ward stepping floodfill, areas of motion were di-rectly attached to parts of the object of interest.

Moeslund and Granum [79] suggested to usecolour information to segment the hand and head.To the sensitivity of orignal RGB-based coloursto the intensity of lightling, they used chromaticcolours which were normalised according to theintensity. In order to determine dance motion, hu-man observers were shown video and 3-D motioncapture sequences on a video display [70]. Ob-servers were asked to define gesture boundarieswithin each microdance, which was analyzed tocompute the local minima in the force of the body.At the moment of each of these local minima, theforce, momentum, and kinetic energy parameterswere examined for each of lower body segments.For each segment a binary triple was computed.It provided a complete characterization of all thebody segments at each instant when body accel-eration was at a local minimum.

6 Robot-guided tracking systems

In this section, one can find a rich variety of re-habiliation systems that are driven by electrome-chanical or electromagnetic tracking strategies.These systems, namely robot-guided systemshereafter, incorporate sensor technologies to con-duct “move-measure-feedback” training strate-gies.

6.1 Discriminating static and dynamic activities

To distinguish static and dynamic activities(standing, sitting, lying, walking, ascendingstairs, descending stairs, cycling), Veltink et al.[107] presented a new approach to monitoring

23

ambulatory activities for use in the domesticenvironment, which uses two or three uniax-ial accelerometers mounted on the body. Theyachieved a set of experiments with respect to thestatic or dynamic characteristics. First, the dis-crimination between static or dynamic activitieswas studied. It was illustrated that static activi-ties could be distinguished from dynamic activi-ties. Second, the distinction between static activ-ities was investigated. Standing, sitting and ly-ing could be distinguished by the output of twoaccelerometers, one mounted tangentially on athigh, and the other sited on the sternum. Third,the distinction between a few cyclical dynamicactivities was conducted.

As a result, it was concluded that the “discrim-ination of dynamic activities on the basis of thecombined evaluation of the mean signal value andsignal morphology is therefore proposed”. Thisruled out the standard deviation of the signal andthe cycle time as the indexes of discriminatingactivities. The performance of the dynamic ac-tivity classification on the basis of signal mor-phology needs to be improved in the future work.The authors revealed that averaging adjacent mo-tion cycles might reduce standard deviations ofsignal correlation so as to improve measure per-formance. As a futher study, Uiterwaal et al.[106] developed a measurement system using ac-celerometry to assess a patient’s functional phys-ical mobility in non-laboratory situations. Ev-ery second the video recording was compared tothe measurment from the proposed system, and itshowed that the validity of the system was up to93%.

6.2 Typical working systems

6.2.1 Cozens

To justify whether motion tracking techniques canassist simple active upper limb exercises for pa-tients recovered from neurological diseases, i.e.stroke, Cozens [40] reported a pilot study of us-ing torque attached to an individual joint, com-bined with EMG measurement that indicated thepattern of arm movement in exercises. Evidencedepicted that greater assistance tended to be given

Figure 30. The MANUS in MIT.

to patients with more limited exercises capacity.However, this work was only able to demonstratethe principle of assisting single limb exercise us-ing 2-D based technique. Therefore, a real systemwas expected to be developed for realistic ther-apeutic exercises, which may contain “three de-grees of freedom at the shoulder and two degreesof freedom at the elbow”.

6.2.2 MANUS

To find out whether exercise therapy influencesplasticity and recovery of the brain following astroke, a tool is demanded to control the amountof therapy delievered to a patient, where appro-priate, objectively measuring the patient’s per-formance. In other words, a system is requiredto “move smoothly and rapidly to comply withthe patients’ actions” [73]. Furthermore, abnor-mally low or high muscle tone may misguide atherapy expert to apply wrong forces to achievethe desired motion of limb segments. To addressthese problems, a novel automatic system, namedMIT-MANUS (Figure 30), was designed to move,guide, or perturb the movement of a patient’s up-per limb, whilst recording motion-related quanti-ties, e.g. position, velocity, or forces applied [73](Figure 31). The experimental results were sopromising that the commercializing of the estab-lished system were under construction. However,it was described that the biological basis of recov-ery and individual patients’ needs should be fur-ther studied in order to improve the performanceof the system in different circumstances. Thesefindings were also justified in [72].

24

Figure 31. Image courtesy of Krebs, H.I.

6.2.3 Taylor and improved systems

Taylor [104] described an initial investigationwhere a simple two DOFs arm support was builtto allow movements of shoulder and elbow in ahorizontal plane. Based on this simple device, hethen suggested a five exoskeletal system to allowactivities of daily living (ADL) to be performed ina natural way. The design was validated by testswhich showed that “configuration interfaces prop-erly with the human arm”, resulting in the trivialaddition of goniometric measurment sensors foridentification of arm position and pose.

Another good example was shown in [89],where a device was designed to assist elbowmovements. This elbow exerciser was strappedto a lever, which rotated in a horizontal plane. Aservomotor driven through a current amplifier wasapplied to drive the lever, where a potentiome-ter indicated the position of the motor. Obtain-ing the position of the lever was achieved by us-ing a semi-circular array of light emitting diodes(LEDs) around the lever. However, this system re-quired a physiotherapist to activate the arm move-ment and to use a force handle to measure forcesapplied. This system was meanless to patients asrealistic physiotherapy exercises normally occurin three dimensions. As a suggestion, a three DOFprototype was rather advised.

To cope with the problem arisen from individ-uals with spinal cord injuries Harwin and Rah-man [56] explored the design of a head controlledforce-reflecting master-slave telemanipulators forrehabilitation applications. This approach was

further expanded for a similar class of assistivedevices that may support and move the person’sarm in a programmed way. Enclosed within thesystem, a test-bed power assisted orthosis con-sisted of a six DOF master with the end effec-tor replaced by a six axis force/torque sensor. Asplint assembly was mounted on the force torquesensor and supported the person’s arm. The baselevel control system first substract the weight ofthe person’s arm from the whole measurement.Control algorithms were established to relate theestimation of the patient’s residual force to systemposition, velocity and acceleration [101]. Thesecharacteristic parameters are desired in regularmovement analysis. Similar to this technique,Chen et al. [36] provided a comprehensive jus-tification for their proposal and testing protocols.

6.2.4 MIME

Burgar et al. [33] and [76] summarized systemsfor post-stroke therapy conducted at the Depart-ment of Veterans Affairs Palo Alto in collabora-tion with Stanford University. The original prin-ciple had been established with two or three DOFelbow/forearm manipulators. Amongst these sys-tems, the MIME shown in Figure 32 was moreattractive due to its ability of fully supportingthe limb during 3-D movements, and self-guidedmodes of therapy. Subjects were seated in awheelchair close to an adjustable height table. APUMA-560 automation was mounted beside thetable that was attached to a wrist-forearm ortho-sis (splint) via six-axis force transducer. Theseposition digitizer quntified movement kinematics.Clinical trials justified that the better improve-ments occurred in the elbow measures by thebiomechanical measures than the clinical ones.The disvantage of this system is that it could notallow the subject to freely move his/her body.

6.2.5 ARM-Guide

A rehabilitator namely the “ARM guide” [88] waspresented to diagnose and treat arm movementimpairment following stroke and other brain in-juries. Some vital motor impairment, such as ab-normal tone, incoordination, and weakness, could

25

Figure 32. The MIME in MIT.

be evaluated. Pre-clinical results showed thatthis therapy produced quantifiable benefits in thechronic hemiparetic arm. In the design, the sub-ject’s forearm was strapped to a specially de-signed splint that “slides along the linear con-straint”. A motor drove a chain drive attachedto the splint. An optical encoder mounted on themotor indicated the arm position. The forces pro-duced by the arm were measured by a 6-axis loadcell addressing between the splint and the linearconstraint. The system needs to be further de-veloped in efficacy and practicality although itachieved a great success.

6.3 Other relevant techniques

Although the following example might not berelevant to the arm training systems, it still pro-vides some hints for constructing a motion track-ing system. Hesse and Uhlenbrock [58] intro-duced a newly developed gait trainer allowingwheelchair-bound subjects to take repitive prac-tice of a gait-like movement without overstress-ing therapists. It consisted of two footplates po-sitioned on two bars, two rockers, and two cranksthat provided the propulsion. The system gener-ated a different movement of the tip and of therear of the footplate during the swing. Else, thecrank propulsion was controlled by a planetarysymtem to provide a ratio of 60 percent to 40percent between stance and swing pahses. Twocases of non-ambulatory patients who regainedtheir walking ability after 4 weeks of daily train-ing on the gait trainer were positively reported.

A number of projects have been undertaken forhuman arm trajectories. However, to make the

proposed systems feasible to non-trained users,further studies need to be performed for the de-velopment of a patient interface and therapistworkspace. For example, to improve the perfor-mance of haptic interfaces, many researchers ex-hibited their successful prototype systems, e.g.[19], [57]. Hawkins et al. [57] set up an exper-imental apparatus consisting of a frame with onechair, a wrist connection mechanism, two embed-ded computers, a large computer screen, and ex-ercise table, a keypad and a 3 DOF haptic inter-face arm. The user “was seated on the chair withtheir wrist connected the haptic interface throughthe wrist connection mechanism. The device end-effector consisted of a gimbal wich provides anextra three DOF to facilitate wrist movement.”These tests encourage a novel system to be ex-plored so that a patient can move his/her arm con-sistantly, smoothly, and correctly. Also, a friendlyand human-like interface between the system andthe user can be obtained afterwards.

Comprehensive reviews on rehabilitation sys-tems are given in the literature [42] and [43].

7 Discussion

7.1 Remaining challenges

The characters of the previous tracking systemshave been summarized earlier. It is demanding tounderstand the key problems addressed in thesesystems. Identifying the remaining challenges inthe previous systems allows people to specify theaims of further development in the future work.

All the previous systems required therapists toattend during training courses. Without the helpof a therapist, any of these systems either wasunable to run successfully or just lost controllingcommands. The developed systems performed assupervised machines that simply followed ordersfrom the on-site therapists. Therefore, they didnot feasibly achieve patient-guided manipulationtherapy so they can not be directly used in homesyet.

The second challenge is cost. People in-tended to build up complicated systems in orderto achieve multi-purposes. This leads to expen-

26

sive components applied to the designed systems.Some of these systems also consisted of particu-larly designed movement sensors, which limit thefurther development and broad application of thedesigned systems.

Inconvenience is another obvious challenge ad-dressed in the previous systems. Most systemsdemanded people sit in front of a table or in achair. This configuration constrains people in mo-bility so they are not helpful at enhancing theoverall training of human limbs.

Due to the large working space requested forthese systems patients had to prepare spacious re-covery rooms for setting up these syetems. Asa consequence, this prevents people, who haveless accommodation space, from using such sys-tems to regain their mobility. Alternatively, a tele-metric and compact system coping with the spaceproblem shall be instead proposed.

Poor performance of human-computer inter-face (HCI) designed for these syetems has beenrecognized. Unfortunately, people seldom touchthis issue as the other main technical problemshad not been solved yet. However, a bad HCImight stop post-stroke patients actively using anytraining system.

Generally speaking, when one considers a re-covery system, such six issues need to be takeninto account: cost, size, weight, functional per-formance, easy operation, and automation.

7.2 Design specification for a proposed system

Consider a system that looks at the limb reha-bilitation training for the stroke-suffered patientsduring their recovery. The designer has to bemainly concerned with such specified issues asfollows:

Real time operation of the tracking systemis required in order that arm movement can berecorded simultaneously;

Human movement must not be limited in a par-ticular workspace so telemetry is considered fortransmitting data from the mounted sensors to theworkstation;

The proposed system shall not bring any cum-bersome tasks to a user;

Human movement parameters shall be properlyand accurately represented in the computer termi-nal;

A friendly graphical interface between the sys-tem and the user is vital due to its application inhome-based situations.

The whole system needs to be flexibly attachedor installed in a domestic site.

8 Conclusions

A number of applications have already been de-veloped to support various health and social caredelivery. It has been justified that the rehabilita-tion systems are able to assist or replace face toface therapy. Unfortunately, evidence also showsthat human movement has a very complicatedphysiological nature, which prevents futher de-velopment of the existing systems. People henceneed to have an insight into the formulation of hu-man movement. Our proposed project will copewith this technical issue by attempting to grasphuman motion at each moment. Achieving suchan accurate localization of the arm may lead toefficient, convenient and cheap kinetic and kine-matic modelling for movement analysis.

Acknowledgements

We are grateful for the provision of partial liter-ature sources from Miss Nargis Islam in the Uni-versity of Bath, and Dr Huiru Zheng in the Uni-versity of Ulster.

References

[1] In http://www.polyu.edu.hk/cga/faq/.[2] In http://www.microstrain.com/.[3] In http://www.polhemus.com/.[4] In http://home.flash.net/ hpm/.[5] In http://www.vrealities.com/cyber.html.[6] In http://www.5dt.com/.[7] In http://www.fakespace.com.[8] In http://www.exos.com.[9] In http://www.qualisys.se/.

[10] In http://www.charndyn.com/.[11] In http://www.ascension-tech.com/.[12] In http://www.arielnet.com/.

27

[13] In http://www.ndigital.com/polaris.php.[14] In http://www.ptiphoenix.com/.[15] In http://www.simtechniques.com/.[16] In http://www.peakperform.com/.[17] J. Aggarwal and Q. Cai. Human motion anal-

ysis: A review. Computer Vision and ImageUnderstanding: CVIU, 73(3):428–440, 1999.

[18] K. Akita. Image sequence analysis of realworld human motion. Pattern Recognition,17:73–83, 1984.

[19] F. Amirabdollahian, R. Louerio, and W. Har-win. A case study on the effects of a hapticinterface on human arm movements with im-plications for rehabilitation robotics. In Proc.of the 1st Cambridge Workshop on UniversalAccess an d Assistive Technology (CWUAAT),25-27th March, University of Cambridge 2002.

[20] C. Barron and I. Kakadiaris. A convex penaltymethod for optical human motion tracking. InIWVS’03, volume Nov, Berkeley, pages 1–10,2003.

[21] A. Baumberg and D. Hogg. An efficientmethod for contour tracking using active shapemodels. In Proc. IEEE Workshop on Motion ofNon-Rigid and Articulated Objects, pages 194–199, 1994.

[22] A. Baumberg and D. Hogg. Generating spa-tiotemporal models from examples. IVC,14:525–532, 1996.

[23] A. Bharatkumar, K. Daigle, M. Pandy, Q. Cai,and J. Aggarwal. Lower limb kinematics of hu-man walking with the medial axis transforma-tion. In Proc. of IEEE Workshop on Non-RigidMotion, pages 70–76, 1994.

[24] D. Bhatnagar. Position trackers for headmounted display systems. In Technical Re-port TR93-010, Deapartment of Computer Sci-ences, University of North Carolina, 1993.

[25] M. Black, Y. Yaccob, A. Jepson, and D. Fleet.Learning parameterized models of image mo-tion. In Proc. of CVPR, pages 561–567, 1997.

[26] A. Blake, R. Curwen, and A. Zisserman. Aframework for spatio-temporal control in thetracking of visual contour. Int. J. Computer Vi-sion, pages 127–145, 1993.

[27] R. Bodor, B. Jackson, O. Masoud, andN. Ppanikolopoulos. Image-based recon-struction for view-independent human motionrecognition. In Int. Conf. on Intel. Robots andSys., 27-31 Oct 2003.

[28] C. Bouten, K. Koekkoek, M. Verduim,R. Kodde, and J. Janssen. A triaxial accelerom-eter and portable processing unit for the assess-ment daily physical activity. IEEE Trans. onBiomedical Eng., 44(3):136–147, 1997.

[29] R. Bowden, T. Mitchell, and M. Sarhadi. Re-constructing 3d pose and motion from a sin-gle camera view. In BMVC, pages 904–913,Southampton 1998.

[30] G. Bradski and J. Davies. Motion segmenta-tion and pose recognition with motion historygradient. 13:174–184, 2002.

[31] T. Brosnihan, A. Pisano, and R. Howe. Surfacemicromachined angular accelerometer withforce feedback. In Digest ASME Inter. Conf.and Exp., Nov 1995.

[32] S. Bryson. Virtual reality hardware. In Im-plementating Virtual Reality, ACM SIGGRAPH93, pages 1.3.16–1.3.24, New York 1993.

[33] C. Burgar, P. Lum, P. Shor, and H. MachielVan der Loos. Development of robots for reha-bilitation therapy: The palo alto va/stanford ex-perience. Journal of Rehab. Res. and Devlop.,37(6):663–673, 2000.

[34] Q. Cai and J. K. Aggarwal. Tracking humanmotion using multiple cameras. In ICPR96,pages 68–72, 1996.

[35] J. Cauraugh and S. kim. Two coupledmotor recovery protocols are better thanone electromyogram-triggered neuromuscularstimulation and bilateral movements. Stroke,33:1589–1594, 2002.

[36] S. Chen, T. Rahman, and W. Harwin. Per-formance statistics of a head-operated force-reflecting rehabilitation robot system. IEEETrans. on Rehab. Eng., 6:406–414, Dec 1998.

[37] Z. Chen and H. J. Lee. Knowledge-guided vi-sual perception of 3d human gait from a sin-gle image sequence. IEEE Trans. On Systems,Man, and Cybernetics, pages 336–342, 1992.

[38] J. Chung and N. Ohnishi. Cue circles: Im-age feature for measuring 3-d motion of artic-ulated objects using sequential image pair. InAFGR98, pages 474–479, 1998.

[39] M. Cordea, E. Petriu, N. Georganas, D. Petriu,and T. Whalen. Real-time 21/2d head pose re-covery for model-based video-coding. In IEEEInstrum. and Measure. Tech. Conf., Baltimore,MD, May 2000.

[40] J. Cozens. Robotic assistance of an active up-per limb exercise in neurologically impaired

28

patients. IEEE Transactions on RehabilitationEngineering, 7(2):254–256, 1999.

[41] R. Cutler and M. Turk. View-based interpreta-tion of real-time optical flow for gesture recog-nition. In Int. Conf. on Auto. Face and Gest.Recog., pages 416–421, 1998.

[42] J. Dallaway, R. Jackson, and P. Timmers. Re-habilitation robotics in europe. IEEE Trans. onRehab. Eng., 3:35–45, 1995.

[43] K. Dautenhahn and I. Werry. Issues of robot-human interaction dynamics in the rehabilita-tion of children with autism. In Proc. of FROMANIMALS TO ANIMATS, The Sixth Interna-tional Conference on the Simulation of Adap-tive Behavior (SAB2000), 11-15 Sep, Paris2000.

[44] S. Dockstader, M. Berg, and A. Tekalp.Stochastic kinematic modeling and feature ex-traction for gait analysis. IEEE Transactions onImage Processing, 12(8):962–976, Aug 2003.

[45] I. Dukes. Compact motion tracking system forhuman movement. MSc dissertation, Univer-sity of Essex, Sep 2003.

[46] F. Escolano, M. Cazorla, D. Gallardo, andR. Rizo. Deformable templates for tracking andanalysis of intravascular ultrasound sequences.In 1st International Workshop of Energy Min-imization Methods in CVPR, Venecia, Mayo,1997.

[47] R. Fablet and M. J. Black. Automatic detec-tion and tracking of human motion with a view-based representation. In ECCV02, pages 476–491, 2002.

[48] H. Feys, W. De Weerdt, B. Selz, C. Steck,R. Spichiger, L. Vereeck, K. Putman, andG. Van Hoydonck. Effect of a therapeutic in-tervention for the hemiplegic upper limb in theacute phase after stroke: a single-blind, ran-domized, controlled multicenter trial. Stroke,29:785–792, 1998.

[49] W. Freeman, K. Tanaka, J. Ohta, andK. Kyuma. Computer vision for computergames. In Proc. of IEEE International Con-ference on Automatic Face and Gesture Recog-nition, pages 100–105, 1996.

[50] H. Fujiyoshi and A. Lipton. Real-time humanmotion analysis by image skeletonisation. InProc. of the Workshop on Application of Com-puter Vision, Oct. 1998.

[51] D. Gavrila. The visual analysis of humanmovement: A survey. Computer Vision and Im-age Understanding: CVIU, 73(1):82–98, 1999.

[52] L. Goncalves, E. Bernardo, E. Ursella, andP. Perona. Monocular tracking of the humanarm in 3d. In ICCV95, pages 764–770, 1995.

[53] J. Gonzalez, I. Lim, P. Fua, and D. Thalmann.Robust tracking and segmentation of humanmotion in an image sequence. In ICASSP03,April, Hong Kong 2003.

[54] R. Green. Spatial and temporal segmentationof continuous human motion from monocu-lar video images. In Proc. of Image and Vi-sion Computing New Zealand, pages 163–169,2003.

[55] G. Haisong, Y. Shiral, and M. Asada. Mdl-based segmentation and motion modeling in along image sequence of scene with multiple in-dependently moving objects. 18:58–64, 1996.

[56] W. Harwin and T. Rahman. Analysis of force-reflecting telerobotics systems for rebalitationapplications. In Proc. of the 1st European Conf.on Dis., Virt. Real. and Assoc. Tech., pages171–178, Maidenhead 1996.

[57] P. Hawkins, J. Smith, S. Alcock, M. Topping,W. Harwin, R. Loureiro, F. Amirabdollahian,J. Brooker, S. Coote, E. Stokes, G. Johnson,P. Mark, C. Collin, and B. Driessen. Gentle/sproject: design and ergonomics of a stroke re-habilitation system. In Proc. of the 1st Cam-bridge Workshop on Universal Access and As-sistive Technology (CWUAAT), 25-27th March,University of Cambridge 2002.

[58] S. Hesse and D. Uhlenbrock. A mechanizedgait trainer for restoration of gait. Journalof Rehab. Res. and Devlop., 37(6):701–708,2000.

[59] A. Hilti, I. Nourbakhsh, B. Jensen, and R. Sieg-wart. Narrative-level visual interpretation ofhuman motion for human-robot interaction.In Proceedings of IROS 2001. Maui, Hawaii,2001.

[60] D. Hogg. Model-based vision: A program tosee a walking person. Image and Vision Com-puting, 1:5–20, 1983.

[61] T. Horprasert, I. Haritaoglu, D. Harwood,L. Davies, C. Wren, and A. Pentland. Real-time 3d motion capture. In PUI Workshop98,1998.

[62] http://bmj.bmjjournals.com/cgi/reprint/325/.[63] http://www.vicon.com/.

29

[64] http://www.xsens.com/.[65] E. Huber. 3d real-time gesture recognition us-

ing proximity space. In Proc. of Intl. Conf. onPattern Recognition, pages 136–141, August1996.

[66] M. Isard and A. Blake. Contour tracking bystochastic propagation of conditional density.In ECCV, pages 343–356, 1996.

[67] M. Ivana, M. Trivedi, E. Hunter, and P. Cos-man. Human body model acquisition andtracking using voxel data. Int. J. of Comp. Vis.,53(3):199–223, 2003.

[68] O. Javed, S. Khan, Z. Rasheed, and M. Shah.Camera handoff: tracking in multiple uncal-ibrated stationary cameras. In IEEE Work-shop on Human Motion, HUMO-2000, pagesAustin, TX, 2000.

[69] G. Johansson. Visual motion perception. Sci-entific American, 232:76–88, 1975.

[70] K. Kahol, P. Tripathi, and S. Panchanathan.Gesture segmentation in complex motion se-quences. In ICIP, pages Barcelona, Spain,2003.

[71] A. Kourepenis, A. Petrovich, and M. Meinberg.Development of a monotithic quartz resonatoraccelerometer. In Proc. of 14th Biennial Guid-ance Test Symp., Hollman AFB, NM, 2-4 Oct1989.

[72] H. Krebs, N. Hogan, M. Aisen, and B. Volpe.Robot-aided neurorehabilitation. IEEETransactions on Rehabilitation Engineering,6(1):75–87, Mar 1998.

[73] H. Krebs, B. Volpe, M. Aisen, and N. Hogan.Increasing productivity and quality of care:robot-aided nero-rehabilitation. Journal ofRehabilitation Research and Development,37(6):639–652, November/December 2000.

[74] S. Kurakaka and R. Nevatia. Description andtracking of moving articulated objects. InICPR92, pages 491–495, 1992.

[75] W. Long and Y. Yang. Log-tracker: Anattribute-based approach to tracking humanbody motion. PRAI, 5:439–458, 1991.

[76] P. Lum, D. Reinkensmeyer, R. Mahoney,W. Rymer, and C. Burgar. Robotic devicesfor movement therapy after stroke: current sta-tus and challenges to clinical acceptance. TopStroke Rehabil., 8(4):40–53, 2002.

[77] M. Machline, L. Zelnik-Manor, and M. Irani.Multi-body segmentation: Revisiting motion

consistency. In International Workshop on Vi-sion and Modeling of Dynamic Scenes (withECCV02), 2002.

[78] D. Marr and K. Nishihara. Representationand recognition of the spatial organization ofthree dimensional structure. Proceedings of theRoyal Society of London, 200:269–294, 1978.

[79] T. Moeslund and E. Granum. Multiple cuesused in model-based human motion capture. InFG’00, pages Grenoble, France, 2002.

[80] A. Mulder. Human movement tracking tech-nology. In Technical Report 94-1, SimonFraser University, 1994.

[81] S. Niyogi and E. Adelson. Analyzing and rec-ognizing walking figures in xyt. In CVPR,pages 469–474, 1994.

[82] J. O’Rourke and N. Badler. Model based im-age analysis of human motion using constraintpropagation. PAMI, 2:522–536, 1980.

[83] X. Pennec, P. Cachier, and N. Ayache. Trackingbrain deformations in time sequences of 3D USimages. In Proc. of IPMI’01, pages 169–175,2001.

[84] R. Polana and R. Nelson. Low level recogni-tion of human motion. In Proc. of Workshopon Non-rigid Motion, pages 77–82, 1994.

[85] C. Poynton. A technical introducton to digitalvedio. New York: Wiley, 1996.

[86] R. Qian and T. Huang. Estimating articulatedmotion by decomposition. In Time-VaryingImage Processing and Moving Object Recog-nition, 3-V. Cappellini (Ed.), pages 275–286,1994.

[87] J. Rehg and T. Kanade. Model-based trackingof self-occluding articulated objects. In ICCV,pages 612–617, 1995.

[88] D. Reinkensmeyer, L. Kahn, M. Averbuch,A. McKenna-Cole, B. Schmit, and W. Rymer.Understanding and treating arm movement im-pairment after chronic brain injury: Progresswith the arm guide. Journal of Rehab. Res. andDevlop., 37(6):653–662, 2000.

[89] R. Richardson, M. Austin, and A. Plummer.Development of a physiotherapy robot. InProc. of the Intern. Biomech. Workshop, En-shede, pages 116–120, April 1999.

[90] M. Ringer and J. Lasenby. Modelling andtracking articulated motion from multiple cam-era views. In BMVC, Sep, Bristol, UK 2000.

30

[91] A. Roche, X. Pennec, G. Malandain, andN. Ayache. Rigid registration of 3D ul-trasound with MR images: a new approachcombining intensity and gradient informa-tion. IEEE Transactions on Medical Imaging,20(10):1038–1049, oct 2001.

[92] K. Rohr. Toward model-based recognition ofhuman movements in image sequences. Com-puter Vis., Graphics Image Process., 59:94–115, 1994.

[93] R. Ronfard, C. Schmid, and B. Triggs. Learn-ing to parse pictures of people. In EuropeanConference on Computer Vision, LNCS 2553,volume 4, pages 700–714, June 2002.

[94] K. Sato, T. Maeda, H. Kato, and S. Inokuchi.Cad-based object tracking with distributedmonocular camera for security monitoring. InProc. 2nd CAD-Based Vision Workshop, pages291–297, 1994.

[95] N. Shimada, K. Kimura, Y. Shirai, andY. Kuno. Hand posture estimation by combin-ing 2-d appearance-based and 3-d model-basedapproaches. In ICPR00, 2000.

[96] H. Sidenbladh, M. J. Black, and D. Fleet.Stochastic tracking of 3d human figures using2d image motion. In ECCV, pages 702–718,Dublin, June 2000.

[97] H. Sidenbladh, M. J. Black, and L. Sigal. Im-plicit probabilistic models of human motion forsynthesis and tracking. In ECCV, 2002.

[98] C. Sminchisescu and B. Triggs. Covariancescaled sampling for monocular 3d body track-ing. In Proceedings of the Conference on Com-puter Vision and Pattern Recognition, pages447–454, Kauai, Hawaii 2001.

[99] Y. Song, X. Feng, and P. Perona. Towards de-tection of human motion. In CVPR, pages 810–817, 2000.

[100] M. Storring, H. Andersen, and E. Granum.Skin colour detection under chaning lightingconditions. In Proc. of 7th Symposium on In-telligence Robotics System, 1999.

[101] S. Stroud. A force controlled external pow-ered arm orthosis. Masters Thesis, Universityof Delaware 1995.

[102] D. Sturman and D. Zeltzer. A survey of glove-based input. IEEE Computer Graphics andAplications, pages 30–39, 1994.

[103] Y. Tao and H. Hu. Building a visual trackingsystem for home-based rehabilitation. In Proc.

of the 9th Chinese Automation and ComputingSociety Conf. In the UK, England, Sep 2003.

[104] A. Taylor. Design of an exoskeletal arm foruse in long term stroke rehabilitation. InICORR’97, University of Bath, April 1997.

[105] C. Theobalt, M. Magnor, P. Schueler, andH. Seidel. Combining 2d feature tracking andvolume reconstruction for online video-basedhuman motion capture. In Proceedings of Pa-cific Graphics 2002, pages 96–103, Beijing2002.

[106] M. Uiterwaal, E. Glerum, H. Busser, andR. Van Lummel. Ambulatory monitoring ofphysical activity in working situations, a vali-dation study. Journal of Medical Engineering& Technology, 22:168–172, July/August 1998.

[107] P. Veltink, H. Bussmann, W. de Vries,W. Martens, and R. Van Lummel. Detection ofstatic and dynamic activities using uniaxial ac-celerometers. IEEE Transactions on Rehabili-tation Engineering, 4(4):375–385, Dec 1996.

[108] G. Welch and E. Foxlin. Motion tracking sur-vey. IEEE Computer Graphics and Applica-tions, pages 24–38, Nov/Dec 2002.

[109] C. Wren, A. Azarbayejani, T. Darrell, andA. Pentland. Pfinder: Real-time tracking ofthe human body. IEEE Transactions on PatternAnalysis and Machine Intelligence, 19(7):780–785, 1997.

[110] H. Xie and G. Fedder. A cmos z-axis ca-pacitive accelerometer with comb-finger sens-ing. In Technical Report, The Robotics Insti-tute, Carnegie Mellon University, 2000.

[111] Y. Yaccob and M. Black. Parameterized mod-eling and recognition of activities. In ICCV,pages 120–127, 1998.

[112] Y. Yacoob and L. Davies. Learned models forestimation of rigid and articulated human mo-tion from stationary or moving camera. Int.Journal of Computer Vision, 36(1):5–30, 2000.

[113] M. Yeasin and S. Chaudhuri. Developmentof an automated image processing system forkinematic analysis of human gait. Real-TimeImaging, 6:55–67, 2000.

[114] M. Yeasin and S. Chaudhuri. Towards auto-matic robot program: Learning human skillfrom perceptual data. IEEE Trans. on SystemsMan and Cybernetics-B, 30(1):180–185, 2000.

[115] M. Yeasin and S. Chaudhuri. Visual under-standing of dynamic hand gestures. PatternRecognition, 33(11):1805–1817, 2000.

31

[116] L. Zelnik-Manor, M. Machline, and M. Irani.Multi-body segmentation: Revisiting motionconsistency. In International Workshop on Vi-sion and Modeling of Dynamic Scenes (withECCV02), 2002.

[117] T. Zimmerman and J. Lanier. Computer dataentry and manipulation apparatus method. InPatent Application 5,026,930, VPL ResearchInc. 1992.

[118] A. Zisserman, L. Shapiro, and M. Brady. 3dmotion recovery via affine epipolar geometry.Int. J. of Comput. Vis., 16:147–182, 1995.

32

A Survey - Human Movement Tracking and Stroke Rehabilitation

Documents

Transcript of A Survey - Human Movement Tracking and Stroke Rehabilitation