NRI: INT: Individualized Co-Robotics

21
Figure 1: Design studies for the full lower body co-robot emulator and portable version (far right). NRI: INT: Individualized Co-Robotics “The NRI-2.0 program significantly extends this theme [collaborative robots (co-robots)] to focus on issues of scalability [and variety of behavior]: ...; how robots can be designed [optimized online] to facilitate achieve- ment of a variety of tasks in a variety of environments [for a variety of users], with minimal modification to the hardware and software; ...” [NSF17518] Future physical co-robots will need to be easily customizable. This proposal focuses on rapidly individ- ualizing physical co-robots and other physical interfaces using optimization. Our preliminary results with an ankle exoskeleton are exciting, with about a 20-25% reduction in metabolic cost after optimization customizes assistance and augmentation for an individual user performing a specific task. We have seen similar prelimi- nary results in a range of conditions including unilateral and bilateral ankle assistance during walking at various speeds with various loads, walking uphill and downhill, and running. This proposal focuses on 1) Improv- ing the customization process, including exploring other optimization parameters and structure, optimization algorithms, and parameter settings for the current algorithm (CMA-ES). Currently customization takes about one hour to optimize an interaction policy for a single behavior for most individuals. Our long term goal is to build a library of customized interaction policies for an individual performing a variety of tasks, and tune the library online as the user does desired tasks. 2) Developing a model of the co-optimization process, in which a robot optimizes an interaction policy while a human is simultaneously adapting and optimizing their behavior for a different optimization criterion. When does this process converge to desirable optima? What undesired transient optimization behaviors need to be detected and reduced or eliminated? 3) Understanding the effects of gross muscle properties (a Hill-type model) and molecular behavior (a Huxley-type model) on the co-optimization process. We hope to be able to predict the final equilibrium of any human-robot co-optimization (or co-learning) process. 4) We will evaluate our results on lower body physical co-robot testbeds we propose to build (Figure 1), the first in a laboratory using a treadmill, and the second outdoors on irregular terrain. The outdoors testbed will use a similar exoskeleton frame, but replace the benchtop actuators and lab power supply with smaller motors and batteries. These systems will take advantage of our improved optimization processes based on our improved understanding of co-optimization and muscle modeling, and produce new data to further improve our understanding, in a virtuous cycle. A Mystery: Based on our prior co-robot augmentation experiments [16, 18, 38], optimal augmentation is usually achieved with exoskeleton torques that are about half the values of human joint torques observed during unassisted locomotion. Why don’t users fully relax and let the co-robot do all the work, as patients did with early rehabilitation robots? Hypotheses as to why this is the case include nonlinear muscle properties, discomfort caused by forces being applied to soft tissue instead of bones, a desire on the part of the subject to avoid muscles completely relaxing and going slack during the imposed movement, a desire to remain “in control”, and lack of trust of the co-robot by the user. In order to increase acceptance of physical co-robots, we need to understand this issue more clearly. In addition, when designing physical co-robots it is important to consider how muscle-tendon mechanics might change due to interactions with the device and to ensure that any 1

Transcript of NRI: INT: Individualized Co-Robotics

Figure 1: Design studies for the full lower body co-robot emulator and portable version (far right).

NRI: INT: Individualized Co-Robotics“The NRI-2.0 program significantly extends this theme [collaborative robots (co-robots)] to focus on issues

of scalability [and variety of behavior]: ...; how robots can be designed [optimized online] to facilitate achieve-ment of a variety of tasks in a variety of environments [for a variety of users], with minimal modification to thehardware and software; ...” [NSF17518]

Future physical co-robots will need to be easily customizable. This proposal focuses on rapidly individ-ualizing physical co-robots and other physical interfaces using optimization. Our preliminary results with anankle exoskeleton are exciting, with about a 20-25% reduction in metabolic cost after optimization customizesassistance and augmentation for an individual user performing a specific task. We have seen similar prelimi-nary results in a range of conditions including unilateral and bilateral ankle assistance during walking at variousspeeds with various loads, walking uphill and downhill, and running. This proposal focuses on 1) Improv-ing the customization process, including exploring other optimization parameters and structure, optimizationalgorithms, and parameter settings for the current algorithm (CMA-ES). Currently customization takes aboutone hour to optimize an interaction policy for a single behavior for most individuals. Our long term goal isto build a library of customized interaction policies for an individual performing a variety of tasks, and tunethe library online as the user does desired tasks. 2) Developing a model of the co-optimization process, inwhich a robot optimizes an interaction policy while a human is simultaneously adapting and optimizing theirbehavior for a different optimization criterion. When does this process converge to desirable optima? Whatundesired transient optimization behaviors need to be detected and reduced or eliminated? 3) Understandingthe effects of gross muscle properties (a Hill-type model) and molecular behavior (a Huxley-type model) on theco-optimization process. We hope to be able to predict the final equilibrium of any human-robot co-optimization(or co-learning) process. 4) We will evaluate our results on lower body physical co-robot testbeds we proposeto build (Figure 1), the first in a laboratory using a treadmill, and the second outdoors on irregular terrain. Theoutdoors testbed will use a similar exoskeleton frame, but replace the benchtop actuators and lab power supplywith smaller motors and batteries. These systems will take advantage of our improved optimization processesbased on our improved understanding of co-optimization and muscle modeling, and produce new data to furtherimprove our understanding, in a virtuous cycle.

A Mystery: Based on our prior co-robot augmentation experiments [16, 18, 38], optimal augmentationis usually achieved with exoskeleton torques that are about half the values of human joint torques observedduring unassisted locomotion. Why don’t users fully relax and let the co-robot do all the work, as patients didwith early rehabilitation robots? Hypotheses as to why this is the case include nonlinear muscle properties,discomfort caused by forces being applied to soft tissue instead of bones, a desire on the part of the subjectto avoid muscles completely relaxing and going slack during the imposed movement, a desire to remain “incontrol”, and lack of trust of the co-robot by the user. In order to increase acceptance of physical co-robots,we need to understand this issue more clearly. In addition, when designing physical co-robots it is important toconsider how muscle-tendon mechanics might change due to interactions with the device and to ensure that any

1

Figure 2: (A) Parameterization of ankle torque aug-mentation. Each control law determined applied torqueas a function of time, normalized to stride period, as acubic spline defined by peak time, rise time, fall timeand peak torque. (B) Examples of possible torque pat-terns in this space. (C) Co-robot emulator system usedto apply torque to the human ankle in experiments.Off-board motor and control hardware actuated a teth-ered exoskeleton worn on one ankle while participantswalked on a treadmill.

compromised function of the biological system such as detuning the mass-spring dynamics of legged locomotionare sufficiently compensated for. The proposed work will help us understand these issues, and identify what isdesirable in human-robot physical interfaces.

Why is the proposed research appropriate for the NRI 2.0 program? From the call for proposals: “Toscale up effectively, robots will need to be easily customizable .... Features of both the hardware and softwareshould facilitate robots achieving a wide variety of tasks, in a wide variety of situations, for a wide diversityof people.” We are investigating easily customizable robots for achieving a variety of tasks in a variety of sit-uations, as well as facilitating physical collaboration (including peer-to-peer; collaborative manipulation; andaugmentation of human capabilities). Users and co-robots must learn to work together, and thus we need to un-derstand what happens when two optimizers or learning systems are interacting yet working towards differentgoals. We note the NSF’s support of co-robot assistance and exoskeletons. The DoD is interested in co-robotsthat augment users (Soft exosuits and the SOCOM Talos program, for example.) The DoD is interested in“dynamic modeling of the human-robot partnerships to allow continuous improvement of joint performance inreal-world applications, as well as investigations regarding the effectiveness of various models of human-robotinteraction.” We note the DOE’s interest in improving worker ergonomics as well as reducing physical demandsand stress using exoskeletons. We believe our approach can augment workers while protecting them from inter-nal injuries as well as learn the unique movements of a particular user. We note that interdisciplinary researchand research in collaboration with government labs is especially encouraged. We are currently working with theU.S. Army Natick Soldier Research, Development and Engineering Center (NSRDEC) (Collins) on exoskele-tons and are negotiating a contract with the US Special Operations Command (SOCOM) (Atkeson) on buildingan exoskeleton through a company (Apptronik). Our work with laboratory and outdoor co-robot testbeds willallow us to evaluate our approach on complete physical co-robotic systems in real-world settings, integratingrelevant technologies. Our longer-term vision is to use our co-robot testbeds to explore physical designs anddevelop software for future co-robot systems. Versatile testbed systems like the ones employed here could beused to identify optimal device characteristics during a design or prescription process, and then customized mo-bile devices, adaptive or static, could be fabricated. We expect that the resulting high-performance exoskeletons,prostheses and other devices will be used to improve mobility for people with a wide range of unique physiolog-ical needs, from individuals with amputation or disability due to stroke to athletes and soldiers. Our objectivescan only be attained by combining expertise in biomechanics, exoskeletons, and optimization, rather than withjust a collection of smaller projects provided with similar resources. The overall impact of the proposed jointwork on co-robot science and engineering will indeed be greater than the sum of potential individual investigatorcontributions.

2

MotivationPhysical co-robots in general have great promise, but few have yet enhanced performance. A critical ob-

stacle may be the reliance on intuition and hand tuning when determining device function. We have developeda method for automatically identifying optimal assistance and augmentation patterns for individual humans(Figure 2). In preliminary tests an evolution-inspired optimization algorithm (CMA-ES), tolerant of measure-ment noise and human adaptation, determined augmentation torque patterns for each subject that minimizeda rapidly-updated estimate of metabolic rate. After optimizing augmentation for an exoskeleton worn on oneankle, participants (N=11) experienced a 23.6±8.0% decrease in metabolic cost compared to a zero-torque con-dition. This exceptional improvement in energy economy arose from customized augmentation patterns, whichvaried widely across participants, and from facilitation of human motor adaptation. Optimizing human perfor-mance can dramatically improve the effectiveness of assistive and augmentation devices for users with diversephysiological needs.

Methods for automatically discovering, customizing and continuously adapting assistance and augmenta-tion would overcome these challenges, allowing physical co-robots to achieve their potential. We call one suchapproach, in which device control is systematically varied during use so as to maximize human performance,human-in-the-loop optimization. However, closing the loop on human performance is also challenging. Objec-tive functions based on measurements of human performance typically require lengthy evaluation periods andcontain substantial noise; the best available estimate of metabolic energy cost, for example, requires about oneminute of respiratory data per evaluation [68, 46]. The human part of the system also has time-varying dynamics,because humans may adapt slowly to new device behaviors and change their reactions with new exposures [69].The spaces explored are often high-dimensional, because control laws that are general enough to approximateglobally-optimal assistance and augmentation strategies are likely to require multiple parameters per assistedjoint [31]. Initial efforts in this domain have demonstrated the ability to optimize a single gait or device param-eter using line search [24] or gradient descent [46], but these methods are inefficient, due to sensitivity to noiseand drift, and scale poorly, particularly in the presence of parametric interactions. Many optimization methodsthat work well in simulation [72] are subject to these problems; building a quadratic approximation takes time,and the human is changing during that time.

Related WorkDiscovering effective strategies for assisting human gait is challenging. For more than a century, inventors

and scientists have developed exoskeletons and prostheses intended to improve human locomotor performance,particularly in terms of energy economy [21]. Few approaches have been successful [33, 53, 59, 20, 50], how-ever, with only modest enhancements compared to the potential benefits expected based on simulations [80,31, 77]. An overreliance on intuition and specialized hardware may be partially responsible for these short-comings. Assistance and augmentation strategies have typically been derived from mathematical models [19],biomechanics observations [81], and humanoid robots [40], but each of these sources of inspiration simplifiesimportant aspects of the human-robot system [36]. Experiments have primarily been conducted using special-ized prototypes that embed a single intuited functionality, with each prototype requiring years of development,limiting exploration to only a small set of potential assistance and augmentation strategies. Compounding thechallenge, physiological and neurological differences between individuals can cause divergent responses to thesame device [94, 38, 62], and responses can vary strongly during the course of adaptation [30, 69].

Preliminary WorkIn preliminary work on online optimization of physical interfaces participants wore a torque-controlled,

tethered exoskeleton on one ankle (Figure 2). The exoskeleton applied torque as a function of time whenthe foot was on the ground, defined by four control parameters that set the magnitude of peak torque and thetiming of torque onset, peak and removal, constituting a control law. During the optimization phase of theexperiment, an optimization algorithm, Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [32], was

3

Figure 3: (D) Optimized ankle exoskeleton torque pattern for each participant. Patterns varied widely andspanned a large portion of the allowable space. Lines are measured torque, normalized to stride time andaveraged across strides. (E) Torques applied in the static and zero-torque conditions. The static pattern, basedon [38], is similar to the optimized patterns, but resulted in higher metabolic rate. Torques were negligible inthe zero-torque mode. Lines are measured torque, normalized to stride time and averaged across strides andparticipants.

used to identify the control law that minimized the metabolic energy cost of walking for each participant. Themetabolic rate corresponding to each control law was estimated by fitting a first-order model to two minutes ofbreath-by-breath metabolic rate data, using an inverse dynamics approach similar to [68, 24].

Optimized augmentation substantially improved energy economy for all participants, confirming the effec-tiveness of the algorithm. Parameters that minimized energy expenditure were identified after four genera-tions (64 min of walking) for all but two participants; Subjects 6 and 10 appeared to become trapped in localminima, requiring a reset of the algorithm and additional walking (128 and 208 min total, respectively). Inseparate validation trials, optimized augmentation reduced the metabolic cost of walking to 2.20±0.43 W/kg(mean±standard deviation), down from 2.87±0.39 W/kg for walking with the exoskeleton in a fully passivezero-torque mode, an average reduction of 23.6±8.0%. The range of energy cost reductions was 14.2-41.5%.By the same measure, the largest average reduction provided by hand-tuned exoskeletons worn on both legs hasbeen 14.5% [59]. Walking in street shoes is about 9.7% less costly than the zero-torque mode, suggesting abouta 14% net improvement with optimal augmentation compared to normal walking, also exceptionally large.

Optimal augmentation patterns varied widely across subjects (Figure 3), demonstrating the importance ofcustomization. For example, optimal torque onset timing ranged from 17% to 37% of the stride period (Fig-ure 3D), or 26% to 57% of the stance phase, about half the testable range in this and prior studies [53]. Optimizedaugmentation did not replace ankle torque as much as possible, nor did it provide the maximum possible posi-tive mechanical work, inconsistent with some predictions [59, 31, 77]. Optimized torque patterns (Figure 3D)do share some qualitative features with each other, such as a peak torque that occurs at about 50% stride, sug-gesting qualities that may be beneficial for most people and useful initial parameters for future optimizations.However, even subtle differences in torque can have large and unexpected effects on energy use arising fromcomplex interactions with the musculoskeletal and nervous systems [23, 38, 39]. Human-in-the-loop optimiza-tion accommodates this complexity.

Comparisons to a static control law confirm the advantages of optimization and customization and suggestthat facilitating human motor adaptation may be critical. During validation trials, we included a static controllaw designed to maximize positive ankle work (Figure 3E) that we previously found to reduce metabolic rateby 5.8% compared to the zero-torque mode in tests with the same emulator system [38]. In the current study,optimized control resulted in 5.2±7.2% lower metabolic rate than with the static control law. This is a substantial

4

improvement in performance, equivalent to previous findings for the total reduction due to augmentation [20].Interestingly, the static control law yielded a much larger benefit compared to the zero-torque condition thanpreviously observed, a 19.1±8.8% reduction in metabolic cost. This difference suggests an interaction betweenco-robot behavior and human motor adaptation. Participants had similar duration of exposure to the co-robotin both studies, but in the prior study were trained with a narrow range of eight static controllers, whereas theyexperienced 32 diverse controllers here. The wide-ranging, sometimes uncomfortable, control laws participantswere exposed to during the optimization process may have forced them to explore new motor control strategies,which has been shown to be a necessary part of skill acquisition in some interventions [69]. While one maythink of the co-robot as primarily adapting to the human, co-adaptation between human and robot seems to beessential to improved performance.

We have also conducted single subject experiments on other conditions, with exciting results as well. Wedemonstrated the generality of our approach by applying it to several additional devices and locomotion condi-tions. One participant from the first study (N=1) wore an ankle exoskeleton [87] on each leg and experienceda 30% reduction in metabolic rate with optimized assistance compared to the bilateral zero-torque condition,demonstrating the effectiveness of the approach across different co-robot types. The improvement was largerthan the 20% they experienced by assisting only one leg, suggesting that augmentation at additional joints willlead to greater improvements in performance. In this test we also measured a 17% reduction in metabolic ratecompared to walking with no exoskeleton, confirming the expected benefits in absolute performance. In testson another participant (N=1), we found the algorithm to be effective at reducing the metabolic cost of walkingat a typical speed (1.25m/s; 28% vs. zero-torque; 24% vs. no exoskeleton), walking at a faster speed (1.75m/s;34% vs. zero-torque; 24% vs. no exoskeleton) and walking uphill (10% uphill grade; 22% vs. zero-torque; 18%vs. no exoskeleton). Interestingly, our approach is failing to improve slow walking (N=1), driving the appliedtorque to zero (0.75m/s). We suspect this is because an open loop torque profile is not an appropriate augmen-tation policy for slow behaviors. Rather, a closed-loop feedback law would make more sense. These resultsdemonstrate the effectiveness of the algorithm across different walking conditions, including cases where thebest action is none at all. We also applied the approach to running with bilateral ankle exoskeletons (N=1), andfound a 30% improvement in energy economy compared to the zero-torque mode and a 13% savings comparedto running in normal shoes. This demonstrates the effectiveness of the algorithm across different gaits.

Proposed Research: Exploring Other Optimization AlgorithmsWe will explore a range of online data efficient optimization algorithms in addition to further exploration

and tuning of CMA-ES. We will explore issues that span many possible optimization algorithms, such as con-vergence criteria. Algorithms we will test in simulation, and if successful, in actual experiments with humansubjects include grid search, sequential one-dimensional hill-climbing (Powell’s method), ordinal support vec-tor machine, Nelder Mead/Simplex/Amoeba, linear modeling and LQR optimal control design, and trajectoryoptimization based on EMG and kinematic variables,

In early pilot tests of related methods, model-explicit optimization techniques seemed to be ineffective dueto sensitivity to noise and adaptation dynamics. CMA-ES worked well in pilot tests. The method is well-suited to human-in-the-loop optimization because it handles noisy measurements, expensive objective functionevaluations, nonlinear objective functions with unknown structures, and complex, subject-dependent humanlearning and adaptation processes well. CMA-ES is stochastic, which makes it less sensitive to noise thanderivative-based methods such as gradient descent and hill-climbing methods such as Nelder-Mead. CMA-ES includes mechanisms to grow or shrink the standard deviation of the randomly-selected parameter valuesdepending on the evolution of the estimate of the mean over time. These features make CMA-ES more robustagainst thresholds, discontinuities and local minima, as long as the initial values of the mean, covariance matrix,and standard deviation are well-chosen. CMA-ES is less sensitive to noise and drift because it uses only therank order of the objective function values (trial scores), rather than the actual objective function values or theirpartial derivatives.

5

Proposed Research: Exploring Other Policy ParameterizationsWe will explore a much wider range of control policies in the proposed work as compared to the open loop

torque trajectories used in the preliminary work. In that work the co-robot applied torque as a function of timewhen the foot was on the ground, defined by four control parameters that set the magnitude of peak torque andthe timing of torque onset, peak and removal, relative to ground contact and normalized to the current strideperiod (Figure 2). The curve was composed of two cubic splines. Additional low-torque ramp-in and ramp-outpatterns were included for the rest of the stance phase to improve torque tracking. Stride period was estimatedonline by low-pass filtering measured stride periods during walking. The ankle torque curve had a hill-like shapethat can be divided into four sections: a shallow, low-torque setup ramp; a rising s-like cubic spline linking theonset point to the peak; a falling arc-like cubic spline linking the peak to the removal point; and a shallow,low-torque settling ramp.

We used limits determined in pilot tests to set constraints on the four control parameters that defined thepattern of ankle torque. Given the approximate nature of these hand-selected parameter constraints, it is possiblethat a larger solution space could be achieved with further refinement. For example, we re-parameterized theoptimization problem to improve the initial guess of the distribution defined by the covariance matrix, basedon the results of pilot testing. Pilot tests suggested that the peak time and fall time, in units of percent stride,had smaller comfortable ranges than rise time, in units of percent stride, and peak torque, in units of Newtonmeters. After reparameterization, we found that this allowed subjects to complete the first generation withrelative comfort, while covering a sufficient portion of the parameter space so as to span the eventual optimalparameter values. The solution space can be parameterized and constrained in other ways, which could be usedto reflect desirable device characteristics such as energetic passivity.

As mentioned previously, we expect closed-loop augmentation policies to be useful for slow movements,postures such as standing balance, and working against gravity. We may find that combinations of open-loopand closed loop policies may be more effective than pure open-loop policies for normal and fast movements.

Proposed Research: Exploring Other Optimization CriteriaMetabolic rate (indirect respirometry [68]) was used as a measure of overall performance in preliminary

studies. Successful optimization of metabolic rate suggests that alternate objective functions with similar prop-erties could be optimized, for example criteria related to speed or balance. We will explore the use of center ofmass kinetics (individual limbs) to capture inter-limb coordination effects, joint kinetics and kinematics (inversedynamics) to estimate musculotendon force and work, muscle activity (electromyography (EMG)) to captureneuromuscular effects, and muscle fascicle mechanics (measured with ultrasound and predicted with our Open-Sim model) to obtain fascicle work that could all be used to define more appropriate optimization criteria. Usersatisfaction [14] in terms of absolute rating. and comparisons could also be used.

Optimizing co-robot assistance and augmentation based on measurements of the user is challenging. First,human measurements such as metabolic rate, muscle activity, and joint torques are noisy, owing both to compli-cated human physiological and mechanochemical dynamics and to shortcomings in measurement hardware. Asecond challenge is that evaluation of candidate conditions is very expensive in terms of time and human effort.Measurement of metabolic rate requires on the order of minutes of respiratory data from a human interactingwith the device, due to delays in the expression of energy used by muscles in expired gasses [68]. Often, multi-variate optimization methods require a large number of function evaluations per step, and this number increaseswith the dimensionality of the control parameter space. A third challenge is that the nature of the relationshipbetween co-robot control parameters and human physiological processes such as metabolic rate is not knownin advance and may include complex nonlinearities and local minima. A fourth challenge is that humans ex-hibit complex, individualized learning and adaptation processes when using a co-robot. This is problematicfor gradient-based and quadratic approximation methods, because calculating the gradient or quadratic requiressubstantial time, during which the human is changing. The calculated gradient or quadratic will often be inac-curate or out of date. Human adaptation also is difficult for methods that attempt to develop models of the space

6

based on all available data, because data collected early in the adaptation process are likely to conflict with datafrom later in the process.

Proposed Research: Scaling Up To More JointsA challenge is to increase the dimensionality of the optimization parameter space. In this preliminary work,

we optimized four control parameters and therefore chose a population size of eight control laws per generation(using a formula recommended by the author of CMA-ES). This population size is intended to be robust andtherefore applicable to a wide range of parameter spaces [32]. We aimed for four generations of optimizationper participant, based on pilot tests suggesting that convergence was typically achieved in four generations orfewer. Three subjects experienced more or fewer generations of optimization. The optimized control law wasdefined by the final calculated (untested) mean parameter values.

Colleagues who use CMA-ES tend to use small population sizes (16) even in high dimensional optimization.CMA-ES aggregates information across generations to compensate for the small sample size. We will testwhether this approach will work in our context. Theoretical results on CMA-ES scaling with dimensionalitysuggest a logarithmic scaling law, so we are hopeful.

Proposed Research: Understanding Co-OptimizationThe optimization problem we are solving is complex because on each trial the human is also learning what

the co-robot is doing, adapting to it, and in general optimizing their behavior for an optimization criterion thatis different from the co-robot’s criterion. In our preliminary work we made some simplifying assumptions totackle this problem. We kept the assistance fixed during any trial, rather than continually varying the assistanceon each foot step (0.5s) or time step (2ms). We assumed the human would rapidly learn and adapt to the novelassistance, and treated the human as a stationary system on each trial. On each trial the co-robot applied anew policy, and the human learned what that policy was, and created their own policy in response. From thepoint of view of the co-robot the system being optimized is non-stationary across trials. We refer to this asco-optimization. When does this process converge to desirable optima? What undesired transient optimizationbehaviors need to be detected and reduced or eliminated? How does this limit or alter what the co-robot can do?How do we handle the more complex situation where the co-robot changes its policy more rapidly, such as onevery time step?

There are many ways to approach these questions. Perhaps the most general is to formulate a model ofthe interaction as a incomplete information repeated two player game where each player is trying to learn thedynamics and optimization criteria of the other player [9, 29]. In this case the players are not fully competing orcooperating. The optimization criteria may be similar or unrelated. We know that even simple two player gameswith learning can generate arbitrarily complex transients and even chaotic dynamics [66], so part of the proposedwork will be trying to understand when our learning algorithms will converge, and have satisfactory transients.A related formulation is Multiagent Reinforcement Learning (MARL) [12] in which multiple interacting agentssimultaneously attempt to learn to improve their policies.

There are a number of simplifying assumptions that can make analysis of the game or reinforcement learningcontext easier. The most aggressive is to assume the human is a fixed dynamic system, and does not adapt tothe co-robot’s policy. A less aggressive assumption is to assume the human’s reaction to the co-robot’s changein policy will be linear in policy parameterizations for the co-robot and the human policies. The human may beglobally complex with learning and optimization, but locally the resulting adaptation is a linear function of theco-robot’s change in behavior. This approach can be used with more complex model structures for the human’sadaptation, such as a quadratic function, or some form of neural network. Another reasonable assumption isthat the human is trying to keep the system behavior invariant, matching some desired behavior, similar tomodel reference adaptive control. Another simplifying assumption is that the human’s optimization criterionis the same as the co-robot’s, or is known. We propose to explore these different assumptions in the gameand multiagent reinforcement learning contexts, searching for a model that can fit and then predict the human

7

adaptation we see in our experiments.There is related work in the areas of adaptive control, in which human pilots are driving or flying adaptive

vehicles and driver assistance, in which automobiles are tuning engine and vehicle behavior while drivers areresponding to those changes.

Proposed Research: The Muscle-Level Basis Of Co-RoboticsWe would like to understand human adaptation to physical co-robotic interfaces. For example, lower-limb

exoskeletons often produce odd adaptations in humans. This section describes our planned work on muscle-level mechanics and energetics, estimated in data-driven simulations of exoskeleton-assisted walking, whichcan potentially explain why. The next section explains our planned work in muscle mechanochemistry, in whichwe try to understand how current models of molecular mechanical behavior can improve our modeling of howhumans adapt to physical co-robotic interfaces.

We plan to use Hill-type phenomenological (“curve-fit”) muscle models to explain why users adapt to andoptimize physical co-robots the way they do. To describe how, we will describe some of our preliminary workin this area [39]. Using data from preliminary experiments, we performed electromyography-driven forwarddynamic simulations of a musculoskeletal model to explore how changes in exoskeleton augmentation affectedplantarflexor muscle-tendon mechanics, particularly for the soleus. We used a model of muscle energy con-sumption to estimate individual muscle and whole-body metabolic rate. As average exoskeleton torque wasincreased, while no net exoskeleton work was provided, soleus muscle fibers performed more positive mechan-ical work and experienced greater lengthening and shortening throughout the stance phase of gait. There was a90% correlation between simulated estimates of average changes in whole body metabolic energy consumptionand experimental measurements, providing confidence in our model estimates. Our simulation results suggestthat the main benefits of the series tendon are to reduce positive work done by the muscle fibers by storing andreturning energy elastically and to reduce the total excursion of the muscle fibers throughout stance.

So far we have used a generic lower-body musculoskeletal model adapted from a previously publishedmodel (OpenSim musculoskeletal modeling software (v3.1) [22, 5]). The model includes the pelvis and bothlegs, with segments and degrees of freedom as defined in [5]. Of the original 35 lower-limb muscles in themodel, we only include the muscles for which we had electromyographic data: lateral gastrocnemius, medialgastrocnemius, soleus, tibialis anterior, vastus medialis, rectus femoris, and biceps femoris long head. Due tothe fact that the electromyography-driven approach prescribes joint kinematics of the model, omitting musclesdid not invalidate our simulations. Muscle parameters were based on measurements of 21 cadavers [82]. Theraw electromyographic data was first high-pass filtered (20 Hz), full-wave rectified, and low-pass filtered (6 Hz).It was then normalized to maximum muscle activity measured during normal walking, scaled, and delayed. Weused the results of the electromyography-driven simulations to estimate the energy consumed by each muscleusing a modified version of Umberger’s muscle metabolics model [79, 78, 77].

From this type of preliminary study, we have come to believe that usefully interacting with biological mus-cles and tendons, via an external device, is much more difficult than expected. Tendon stiffness and othermuscle-tendon properties seem to be tuned such that the biological ankle operates efficiently. Subtle distur-bances to the system can result in undesirable changes in coordination patterns and whole body metabolicenergy consumption. For example, providing increasing amounts of average exoskeleton torque, while main-taining zero net exoskeleton work, detuned soleus muscle-tendon interactions without compensating for reducedperformance. Disrupted muscle-tendon interactions have similarly been observed in human hopping with ankleexoskeletons [64].

Assistive and augmentation devices should be designed and controlled to compensate for any compromisedperformance or functioning of biological mechanisms. Analyses similar to those discussed above can be used tohelp understand how different co-robot behaviors affect muscle-level mechanics, and provide insights into whycertain device behaviors are more effective than others. For instance, torque support with a device can be aneffective augmentation strategy [18], but subtleties of how the external torques are applied and how the device

8

Figure 4: Left: Comparison of simulated (predicted) inverse-dynamics-derived (“measured”) ankle joint me-chanics. Top row: Simulated muscle-generated ankle joint moments compared to inverse-dynamics-derived an-kle joint moments. Bottom row: Simulated muscle-generated ankle joint powers compared to inverse-dynamics-derived ankle joint powers. Simulated muscle-generated joint moments and powers were calculated by summingthe individual contributions of the exoskeleton-side lateral gastrocnemius, medial gastrocnemius, soleus, andtibialis anterior. Each line is the subject mean (N = 8) for a given condition. Conditions with increasing averageexoskeleton torque are shown in green. Conditions with increasing net exoskeleton work rate are shown in pur-ple. Darker colors indicate higher values. Normal walking, without an exoskeleton, is shown by the gray dashedline. All values were normalized to body mass. For reference, exoskeleton torque trajectories for each of thedifferent conditions can be found in Figure 4 in [38]. Right: A cartoon of the two possible crossbridge cycles.Green arrows show rapid transitions. Arrows with other colors show proposed controlled transitions. The goldarrow shows where calcium ions (Ca++) control the attachment of myosin heads (M, blue block) to the actin(A, yellow block) thin filaments. The red arrow shows where low crossbridge (XB) strain permits transitionsto the UNLOCKED state. The brown arrow indicates where crossbridge strain that exceeds a mechanical limitcauses detachment and slipping. The faint blue cycle shows the traditional dominant pathway when the muscleis shortening. The faint green cycle shows the proposed dominant pathway when the muscle is lengthening, acycle of detachment and attachment that leads to a viscous-like resistance to changes in muscle length, withoutusing ATP. In the loaded isometric case the crossbridges remain in the locked state. The purple line is the leverarm of the myosin head, and the red line is a hypothesized elastic element.

interacts with the biological system greatly impact coordination patterns and overall effectiveness.The modeling approaches used in this study can be applied to a wide array of human motions. The results

suggest that, given a coordination pattern, via measured muscle activity and joint kinematics, it is possible togenerate reasonable estimates of energy use and other physiological parameters. In the future it may be possibleto invert the process. Based on what we know about the mechanics and energetics of individual muscles, we cantry to generate a set of desirable coordination patterns. It may even be possible to prescribe co-robot behaviorsthat elicit these desirable changes in coordination.

Although the results produced from this approach seem reasonable, there are a number of limitations. If theparameters used in the model were inaccurate, this could have led to invalid estimates of muscle mechanics andenergetics. The parameters we used are, however, comparable to previously published work [4, 5] which arebased on cadaver studies [82]. Furthermore, to validate the approach, we compared muscle-generated ankle jointmoments and powers to inverse-dynamics-derived ankle joint moments and powers (Figure 4). We optimizedparameters to reduce the root-mean-square error between the two and an in-depth sensitivity analysis shows thatthe qualitative trends are robust to model parameters.

We are also limited by the number of muscles we can measure. Results from this study still producedreasonable estimates of metabolic energy consumption. Including more muscles in future experiments wouldmake these analyses more complete.

9

Muscle-generated ankle joint mechanics did not perfectly match inverse-dynamics-derived ankle joint me-chanics. Most trends were consistent across the two methods. Results from inverse dynamics, however,suggested that total exoskeleton-side positive ankle joint work decreased as average exoskeleton torque in-creased, while results from the forward simulations suggested that total exoskeleton-side positive ankle jointwork remained relatively unchanged. This inconsistency could have implications for our understanding of whycontralateral-limb knee mechanics and vastus metabolic energy consumption were affected by torque applied atthe exoskeleton-side ankle joint. These results illustrate the importance of knowing the limitations and assump-tions inherent in a model and taking these into consideration when analyzing and interpreting outputs from amodel. To account for these limitations, we minimized inconsistencies between inverse-dynamics-derived andmuscle-generate joint mechanics by optimizing those model parameters in which we had the least confidence.

We believe this type of work is the first data-driven investigation of changes in muscle mechanics duringwalking with an exoskeleton using reasonably accurate muscle and tendon mechanical models. We were able toexplain experimentally-observed changes in coordination patterns and metabolic energy consumption. Modelswithout muscles and tendons would not have been able to capture these effects. We expect the results from thistype of study applied to more complex co-robots and the entire human body to lead to greater insight into thefunctioning of muscle-tendon units and to guide the design and control of co-robots that interact effectively withthese biological mechanisms.

We will also use our co-robot testbeds for perturbation studies to evaluate our understanding of how themusculoskeletal system works.

Proposed Research: The Molecular Basis Of Co-RoboticsIn a Hill-type phenomenological muscle model, curve-fits are used to justify the claim that reducing the total

excursion of the muscle fibers throughout stance results in less energy usage. We believe that modern Huxley-type models that take into account the molecular mechanochemistry can now predict this effect more accurately(Figure 4) [48, 56, 57, 61, 17, 55, 70, 49, 13]. Furthermore, it is becoming clear that muscle lengthening whilegenerating force is very different than muscle shortening while generating force. We believe that Huxley-typemodels can explain this effect as well. The research described in this section attempts to bridge muscle molecularmechanisms and human-co-robot physical interaction, and stems from our attempts to more accurately modelenergy use, muscle force generation, muscle stiffness, and muscle resistance to motion.

The key issue is that muscle uses much less energy to generate the same force when it holds a positionor resists lengthening as compared to generating the same force while shortening. This is not true of electricmotors. Energy use is proportional to the absolute value of motor current, and thus torque. Hydraulic energy useis proportional to the absolute value of joint velocity, in addition to a large constant term due to internal leakage.How does muscle not burn a lot of energy under load when the muscle (not including the tendon) is stayingat the same length (isometric) or lengthening under load (doing negative work)? The crossbridges are bathedin stored energy (Adenosine Triphosphate, ATP) all the time, so it is not that the energy source is removed. IfATP hydrolysis is used for crossbridge detachment, why doesn’t any form of crossbridge cycling (which has tohappen in muscle lengthening under load) burn energy [60]?

It has been recently confirmed that skeletal muscle myosin shares with other myosin isoforms a catch-bondproperty. The myosin head is much less likely to detach from the actin filament if the crossbridge is stretched. Inaddition, it has been hypothesized that there are rapid crossbridge detachment and reattachment processes thatdo not require the use of stored energy in the form of Adenosine Triphosphate (ATP). These processes allowa crossbridge to translate or “jump” along the actin filament to continue to resist a muscle being lengthened.The catch-bond and jump properties allow muscle to macroscopically act as a brake with little use of energywhen lengthening, as well as like a spring for small perturbations. These properties also imply that large scaleuse of ATP only occurs when the muscle is shortening, and only when the muscle is doing positive work as aunidirectional motor.

We will augment the Hill-type models discussed in the previous section with the effects predicted by modern

10

Huxley-type models [48, 56, 57, 61, 17, 55, 70, 49, 13]. We will use whole limb data from our experimentsto “curve-fit” the augmented models. We believe that the improved augmented models will do a better jobexplaining what a human user is adapting towards during our experiments. We believe the view in biomechanicsand neuroscience of muscle as a spring with fixed length-tension and force-velocity curves selected by activationis questionable. While tissues other than crossbridges may have fixed (but nonlinear) spring-like propertiesand generate the same force when returned to the same position and velocity, the macroscopic behavior of aset of crossbridges can be different depending on the history of the muscle activation and load, in additionto the current muscle length and velocity. Understanding the mechanochemistry of muscle allows us to gobeyond current phenomenological (curve-fit) Hill-type muscle models to better explain metabolic costs, force-velocity relationships, short range stiffness, force enhancement, catch phenomena, and other muscle propertiesand nonlinearities, and make more accurate muscle models to simulate and predict behavior, as well as providetherapy, rehabilitation, and physical assistance and augmentation.

EvaluationWe will pursue an innovative approach to evaluation, building on our prior work. Instead of prematurely

committing to a particular co-robot design, we propose building two physical co-robot emulators for lower-body co-robots (Figure 1). The first emulator uses very powerful and fast benchtop actuators to physicallysimulate a wide range of co-robot designs, and allows us to explore and evaluate many proposed co-robotsand co-robot control schemes, before investing the effort and resources to actually build each one. The secondco-robot emulator uses a similar lower-body structure, but uses smaller motors and power source (batteries)in a backpack, allowing realistic tests on outdoor irregular terrain. The laboratory evaluation system is beingdesigned and constructed. We request funding for Humotech, a commercial spinoff from the Collins lab, todesign and build the portable co-robot emulator.

These testbeds will allow us to explore a range of co-robot physical interaction customization strategies in arange of behaviors including level and inclined and unloaded and loaded walking and moderate speed runningover irregular terrain, as well as scrambling over boulders, climbing and descending steep slopes and rock faces,and walking, running, and jumping across stepping stones and pole tops. We will build on evaluation metrics wehave used, such as metabolic cost, amount of muscle electrical activity (EMG), magnitude of co-robot forces onthe user, magnitude of user internal forces and torques, features (such as asymmetry) of user kinematic patterns,and user self reports.

Mechanical Design: The co-robot testbeds will be designed based on principles and techniques we havedeveloped and experimentally validated over the past five years. Ankle end-effectors will be refined versions ofour current successful devices. Knee end-effectors will be refined versions of current prototypes, redesigned forlower mass to allow running. Hip end-effectors are planned to include a revolute flexion-extension joint, whichallows direct sensing of joint angle and reduces reaction forces applied to the users back. The hip joint will alsoinclude a passive flexure for ad-abduction, which will allow the subject to move their foot mediolaterally forbalance but not add substantially to worn mass. The torso frame will contact the pelvis and shoulders, providinga large moment arm that results in low contact forces for a given applied torque. Contacting the back and pelvisin this way also results in normal contact forces, as opposed to shear contact forces, which results in a stifferand more comfortable interface. Widely spacing the contact points further reduces slop at the interface, since itreduces the angular displacement for a given linear displacement at the contact points. Our specific design goalsare as follows.

Design goals; speed and bandwidth: High exoskeleton speed is essential to avoid interference with naturalmotions of the limbs, especially during leg swing. The maximum speed of our current ankle exoskeleton is16rad/s. Closed-loop torque bandwidth of the current laboratory testbed with benchtop actuation (38Hz) isabout three times greater than that of human muscle and exceeds values of all other exoskeletons capable ofgenerating similar magnitudes of torque (due to remotizing the actuation). A design goal for the outdoor testbedis greater than 2 revolutions/second maximum velocity at each joint, in order to support rapid error responses

11

and moderate speed running [63].Design goals: maximum torques: We will seek to maximize co-robot joint torques given a weight budget

of about 1 kg/joint. Based on our prior augmentation experiments[16, 18, 38] we estimate that optimal aug-mentation will be achieved with exoskeleton torques that are about half the values observed during unassistedlocomotion (50-100Nm depending on the joint), so this is our minimum design goal. Peak torque in excess ofthese values will be useful. Our current treadmill-based ankle exoskeleton is more powerful than this, with amaximum torque of 120Nm.

Design goals: weight: Our current ankle exoskeleton weighs 0.8kg. Weight limits for the full indoor systemare as follows: The entire worn portion of the system is expected to weigh 6 kg. Each foot-ankle-shank sectionis expected to weigh 0.75 kg, based on current, established hardware. Each knee-thigh section is expected toweigh 1.0 kg, based on a current prototype. Each hip-torso section is expected to weigh 1.25 kg (2.5 kg for bothhips and the torso), extrapolating from ankle and knee hardware. Low mass is the result of off-board power andcontrol together with careful mechanical design.

Design goals: range of motion: The range of motion of each exoskeleton emulator joint will exceed valuesobserved during walking, running and sprint running. We will include additional range in hip flexion, kneeflexion and ankle plantarflexion, which will allow for changes in kinematics following adaptation to use of theexoskeleton. Hip extension, knee extension and ankle dorsiflexion will be limited to the ranges observed duringwalking and running, which correspond to natural limits, to avoid hyperextension. All limits will correspond tohard stops on the device, after which further torque development that could injure the user is not possible.

Design goals: sensing: All joints will be instrumented with high-resolution encoders to measure jointangle. All Bowden cable termination points will be instrumented with strain gages to measure torque. We havedeveloped reliable strain gage instrumentation approaches that allow torque to be measured at 500 Hz with lessthan 1% measurement error.

Research PlanOur experimental work provides whole-limb and whole-body data that informs our modeling work and our

exploration of alternative optimization approaches, as well as helps us understand how the human motor controlsystem works. Better models and model-driven optimization improve our experimental results. Our researchplan focuses on this cycle.

This rearch involves one postdoc and two students. We expect the postdoc to take a lead role in coordinatingthe project and performing the evaluation. We expect one student to focus on optimization and co-optimizationissues (involving applied mathematics and optimal control) and one student to focus on muscle modeling (in-volving mechanochemisty, physiology, and biomechanics).

Year 1: Continue experiments with existing laboratory testbed (bilateral ankles). Improve optimizationapproach in simulation using current Hill-type muscle model. Develop a library-based interaction policy li-brary approach, with online policy tuning using current Hill-type muscle model. Build full lower-body labora-tory testbed and initiate experiments. Build outdoor bilateral ankle testbed and initiate experiments. Developco-optimization theory, and extensively test proposed algorithms in simulation using current Hill-type musclemodel. Improve current Hill-type muscle model. Develop molecular and sarcomere-level muscle modeling.Begin evaluation.

Year 2: Year 2. Continue full lower-body laboratory testbed experiments with improved optimization ap-proach and with policy tuning. Improve optimization approach in simulation using improved Hill-type musclemodel. Improve library-based interaction policy library approach and online policy tuning using improved Hill-type muscle model. Continue outdoor bilateral ankle testbed experiments with improved optimization approachand with policy tuning. Build full lower-body outdoor testbed and initiate experiments. Continue to developco-optimization theory, and extensively test proposed algorithms in simulation using improved Hill-type mus-cle model. Extend molecular and sarcomere-level muscle modeling to whole muscle and lower-body modeling(Huxley-type model). Continue evaluation.

12

Year 3: Continue full lower-body laboratory testbed experiments with improved optimization approach andwith policy tuning. Improve optimization approach in simulation using Huxley-type muscle model. Improvelibrary-based interaction policy library approach and online policy tuning using Huxley-type muscle model.Continue full lower-body outdoor testbed experiments with improved optimization approach and with policytuning. Continue to develop co-optimization theory, and extensively test proposed algorithms in simulationusing Huxley-type muscle model. Continue evaluation.

Prior NSF Supported WorkAtkeson: NSF Award IIS-1563807 (PI: Geyer, Atkeson co-PI), 8/1/16-7/31/20. $317,007 in year 1. RI:

Medium: Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior, startedAug 1, 2016, and has been active for a few months. Because this award is so new, we report on a previous NSFaward.

Atkeson: (a) NSF award: IIS-0964581 (PI: Hodgins, Atkeson co-PI); amount: $699,879; period: 7/1/10- 6/30/14. (b) Title: RI: Medium: Collaborative Research: Trajectory Libraries for Locomotion on RoughTerrain. (c) Summary of Results: This grant has supported work on a variety of approaches to controllinghumanoid robots based on trajectory libraries.

Intellectual Merit: The primary goal of our work on Trajectory Libraries for Locomotion on Rough Ter-rain was to develop control systems for humanoid robots that show human levels of competence, robustnessand flexibility in locomotion on human-scale rough terrain, and explore library approaches to generating be-havior. Results include: 1) A hierarchical approach to online optimal control of behavior. Low level behavior(accelerations, joint torques, and contact forces) is optimized on a very fast time scale (1ms) using quadraticprogramming. Longer term behavior (center of mass trajectory, for example) is optimized for several secondsusing Differential Dynamic Programming (a 2nd order gradient based trajectory optimization technique). Thiswork received a ”Best Oral Paper Award” at Humanoids 2013. 2) Multiple model policy optimization - algo-rithms for training robust policies/controllers using multiple models. This approach forms the basis for newalgorithms to learn and optimize policies using model-based approaches, greatly speeding up reinforcementlearning and developing a new approach to robust learning. 3) Learning of optimized trajectories - simple learn-ing approaches to concisely represent and accurately predict optimal trajectories. 4) Globally optimal control ofinstantaneously coupled systems (ICS), which is designed by coordinating multiple lower-dimensional optimalcontrollers. We augmented subsystems of the ICS with coordination variables, and then used value functions tocoordinate the augmented subsystems by managing tradeoffs of the coordination variables. 5) We discoveredthat many features of optimized walking including costs can be fit using simple global function approxima-tion, such as quadratic function approximation. 6) A paradigm for designing controllers for complex systems,“informed priority control”, which coordinates multiple sub-policies. These tools enable us to prototype com-plex control system designs faster. 7) State estimation for mobile and humanoid robots with “floating body”dynamics.

We have made contributions to planning and control of human-like motion for humanoid characters androbots. We have made contributions to the control of low impedance robots and similar systems. We have madecontributions to understanding human standing balance and walking. Our findings have contributed to progressin planning algorithms, and also progress in nonlinear optimal control. More specifically we have contributedto the planning and optimal control of high dimensional systems such as humanoid robots. Our work led to anexcellent performance of our team (WPI-CMU) in the DARPA Robotics Challenge, where we were the onlyrobot to try all tasks and not fall down or need to be rescued by humans.

Broader Impacts. We developed more useful robots and knowledge that may help with a significant socialproblem, why people fall and injure themselves. We coordinated our outreach activities with the larger out-reach efforts of CMU’s NSF Engineering Research Center on Quality of Life Technology to scale up reach andeffectiveness. Our technologies are being shared by being published, and papers are available electronically.The technologies were demonstrated on entries in the DRC. We created and taught relevant courses, including

13

an undergraduate course on humanoid robotics and a graduate course on optimization of behavior. We put theteaching materials on the web to widely disseminate the results. Our work is being used by Disney Research andthrough this technology transfer will eventually be used in entertainment and education applications, and willbe available to and inspire the public. A graduate student participated on a Discovery Channel TV series, ”TheBig Brain Theory: Pure Genius”. One purpose of the TV series was getting people excited about engineering. Agraduate student supported a group of all female high school students in the robot FIRST competition. A robotcharacter in a Disney movie was inspired by our work on soft robots (Baymax in Big Hero 6) [7].

Development of Human Resources. The project involved five PhD graduate students and four postdocs.We had weekly individual meetings, weekly lab meetings, and we individually mentored all participants. Allparticipants actively did research, made presentations to our group, gave conference presentations, and gavelectures in courses. The students served as teaching assistants.(d) Publications resulting from this NSF award: [93, 58, 1, 47, 92, 67, 76, 73, 75, 88, 95, 71, 65, 2, 96, 6, 45,84, 90, 35, 51, 3, 86, 74, 83, 88, 85, 41, 25]. This award led to further work supported by DARPA: [26, 27, 28,89, 91, 11, 10, 52, 8].(e) Other research products: We have made our motion data available in the CMU motion capture database [34].(f) Renewed support. This proposal is not for renewed support.

Collins: (a) NSF award: CMMI-1300804 (PI: Collins); amount: $216,740; period: 8/15/13 - 7/31/16.(b) Title: “Collaborative Research: User-Optimal Robotic Prosthesis Design”(c) Summary of Results:

Intellectual Merit: This project developed two prosthetic foot emulators, which are exceptionally versatilehardware systems that speed and systematize prosthesis design. Experiments with these emulators revealedsurprising relationships between two key prosthesis features and amputee energy economy, identified three newcontrol techniques that improve amputee balance, and demonstrated a new method for identifying the prosthesisfeatures that optimize patient satisfaction.

Broader Impacts: Improved prosthesis designs and fitting tools arising as a result of this project are ex-pected to lead to improved mobility and quality of life for hundreds of thousands of Americans with disabilitiesarising from amputation. Benefits are expected to be particularly high for individuals with amputation arisingas a result of trauma, for example among veterans returning from combat. The prescription tools developed inthis project are expected to improve the efficiency of healthcare delivery for people with amputation by provid-ing objective justification data. The prosthesis emulator systems developed as a part of this project have beencommercialized by a small business, Human Motion Technologies, L.L.C. (Humotech), started by a former PhDstudent and project participant.

Development of Human Resources. The project involved two PhD graduate students, three Master’s stu-dents and one postdoc, including one female PhD student and one underrepresented minority Master’s student.The PI held weekly individual meetings and weekly lab meetings with all mentees. Project participants ac-tively performed research, made presentations to our group and to a multi-group seminar organized by the PI,and gave conference presentations. PhD students also served as teaching assistants in courses on design andbiomechatronics.(d) Publications resulting from this NSF award: [16, 15, 14, 18, 38, 39, 42, 44, 43, 54, 62, 87, 37](e) Other research products: Other research products, including data sets, videos and design guides, can befound on the PIs laboratory website: http://biomechatronics.cit.cmu.edu(f) Renewed support. This proposal is not for renewed support.

Broader Impacts [of the Proposed Work]We expect this work to provide customized assistance for people with disabilities, and customized augmen-

tation for older adults, workers, and soldiers. We expect customized assistance and augmentation to be muchmore effective than manually designed assistance and augmentation. A long term goal is to customize physicaltraining as well.

14

Education and Outreach. A major outreach initiative led by Atkeson is the creation of a physical andvirtual Robot Museum. So far we have created exhibits on juggling robots, robot actuation (gears vs. directdrive), mobile robots, soft robots, Steve Jacobsen and Sarcos, robots in literature, legged robots, computergraphics (Ivan Sutherland), and AI (Newell and Simon). Our next major initiatives are 1) to develop cell phoneapps that trigger off augmented reality (AR) tags and robot pictures in halls to provide a self-guided tour ofthe Robotics Institute, and 2) use virtual reality (VR) to provide access to our collection from anywhere in theworld. We want anyone to be able to design, build, debug, evaluate, and repair a historical robot in virtualreality. The impact of Atkeson’s work will be increased by a new Disney TV show (premieres Fall 2017) basedon the characters from the Disney movie Big Hero 6, including the inflatable medical robot Baymax inspiredby Atkeson’s work on inflatable robots. We have coordinated our outreach activities with the larger outreachefforts of CMU’s Robotics Institute to scale up reach and effectiveness. Our technologies are being shared bybeing published, and papers and software are available electronically.

Development of Human Resources. This project will fund a series of postdoctoral fellows and graduatestudents. See the Postdoc Mentoring statement and Coordination Plan for more information.

Participation of Underrepresented Groups. We will make use of ongoing efforts in the Robotics Instituteand CMU-wide. These efforts include supporting minority visits to CMU, recruiting at various conferences andeducational institutions, and providing minority fellowships. As the Robotics Institute PhD admissions chair in2016, Atkeson led a process which resulted in 31% of acceptances going to female applicants. As a memberof the Robotics Institute hiring committee in 2016, Atkeson participated in a process that led to 10 out of 18interviewees being female. Atkeson is assisting efforts at CMU to raise money for fellowships for students whowill help us in our efforts to serve diverse populations and communities, including our own.

Dissemination Plan. For a more complete description of our dissemination plan, see our Data ManagementPlan. We will maintain a public website to freely share our simulations and control code, and to documentresearch progress with video material. We will present our work at conferences and publish it in journals, andwill use these vehicles to advertise our work to potential collaborators in science and industry.

Technology Transfer. Our research results and algorithms are being used by Disney Research and throughthis technology transfer path will eventually be used in entertainment and education applications, and will beavailable to and inspire the public. Two recent postdocs work at Disney Research. Three recent students work atBoston Dynamics transferring our work to industrial applications, one recent student and recent postdoc workon self-driving cars at Uber, one recent student works on self-driving cars at Apple, and one recent student workson humanoid robotics at the Toyota Research Institute. An older former student is the CTO of the Amazon droneeffort. Several older former students work at Google. We are thrilled that we and our students are part of therobotics revolution.

Curriculum Development ActivitiesWe will develop course material on robot learning and reasoning, which will directly be influenced by the

planned activities of this proposal and freely available on the web. The PIs currently teach several coursesthat will benefit from this material. For example, 16-745: Dynamic Optimization and 16-711: Kinematics,Dynamics, and Control, directly address the research areas in which this proposal is embedded. We also teacha course designed to attract undergraduates into the field, 16-264: Humanoids. All of these courses emphasizelearning from interaction by working with real robots.

15

References

[1] S.O. Anderson and J.K. Hodgins. Adaptive torque-based control of a humanoid robot on an unstableplatform. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pages511–517, 2010.

[2] S.O. Anderson and J.K. Hodgins. Informed priority control for humanoids. In Humanoid Robots (Hu-manoids), 2011 11th IEEE-RAS International Conference on, pages 416–422, 2011.

[3] Stuart Anderson. The Design of Control Architectures for Force-controlled Humanoids Performing Dy-namic Tasks. PhD thesis, Robotics Institute, Carnegie Mellon University, 2013.

[4] A. S. Arnold, D. J. Asakawa, and S. L. Delp. Do the hamstrings and adductors contribute to excessiveinternal rotation of the hip in persons with cerebral palsy? Gait Post., 11:181–190, 2000.

[5] E. M. Arnold, S. R. Ward, R. L. Lieber, and S. L. Delp. A model of the lower limb for analysis of humanmovement. J. Biomed. Eng., 38:269–279, 2010.

[6] C. G. Atkeson. Efficient robust policy optimization. In American Control Conference, 2012.

[7] C. G. Atkeson. Big Hero 6: Let’s Build Baymax. build-baymax.org, 2015.

[8] C. G. Atkeson, B. P. W. Babu, N. Banerjee, D. Berenson, C. P. Bove, X. Cui, M. DeDonato, R. Du, S. Feng,M. Gennert, J. P. Graff, P. He, A. Jaeger, J. Kim, K. Knoedler, L. Li, C. Liu, X. Long, T. Padir, F. Polido,G. G. Tighe, and X. Xinjilefu. NO FALLS, NO RESETS: Reliable humanoid behavior in the DARPARobotics Challenge. In IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2015.

[9] R. J. Aumann, M. Maschler, and R. E. Stearns. Repeated games with incomplete information. MIT Press,1995.

[10] Benzun Pious Wisely Babu, Ruixiang Du, Taskin Padir, and Michael Gennert. Improving robustness incomplex tasks for a supervisor operated humanoid˙ In IEEE-RAS International Conference on HumanoidRobots (Humanoids), 2015.

[11] Nandan Banerjee, Xianchao Long, Ruixiang Du, Felipe Polido, Siyuan Feng, Christopher G. Atkeson,Michael Gennert, and Taskin Padir. Human-supervised control of the ATLAS humanoid robot for travers-ing doors. In IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2015.

[12] L. Busoniu, R. Babuska, and B. De Schutter. Multi-agent reinforcement learning: An overview. In Studiesin Computational Intelligence: Innovations in Multi-Agent Systems and Applications, volume 310, pages183–221, 2010.

[13] S. G. Campbell, P. C. Hatfield, and K. S. Campbell. A mathematical model of muscle containing hetero-geneous half-sarcomeres exhibits residual force enhancement. PLoS Computational Biology, 7(9), 2011.

[14] J. M. Caputo, P. G. Adamczyk, and S. H. Collins. Informing ankle-foot prosthesis prescription throughhaptic emulation of candidate devices. In IEEE International Conference on Robotics and Automation(ICRA), 2015.

[15] J. M. Caputo and S. H. S. H. Collins. Prosthetic ankle push-off work reduces metabolic rate but notcollision work in non-amputee walking. Nature Scientific Reports, 4, 2014.

[16] J. M. Caputo and S. H. S. H. Collins. A universal ankle-foot prosthesis emulator for human locomotionexperiments. ASME Journal of BioMechanical Engineering, 136, 2014.

[17] M. Caremani, L. Melli, M. Dolfi, V. Lombardi, and M. Linari. Force and number of myosin motorsduring muscle shortening and the coupling with the release of the ATP hydrolysis products. J Physiol,539(15):3313–3332, 2015.

[18] S. H. Collins, M. Kim, T. Chen, and T. Chen. An ankle-foot prosthesis emulator with control of plan-tarflexion and inversion-eversion torques. In IEEE International Conference on Robotics and Automation,pages 1210–1216, 2015.

[19] S. H. Collins and A. D. Kuo. Recycling energy to restore impaired ankle function during human walking.Public Library of Science: ONE, 5(e9307), 2010.

[20] S. H. Collins, M. B. Wiggin, and G. S. Sawicki. Reducing the energy cost of human walking using anunpowered exoskeleton. Nature, 522:212–215, 2015.

[21] W. Cornwall. In pursuit of the perfect power suit. Science, 350:270–273, 2015.

[22] S. L. Delp, F. C. Anderson, A. S. Arnold, P. Loan, A. Habib, C. T. John, E. Guendelman, and D. G. Thelen.Opensim: open-source software to create and analyze dynamic simulations of movement. Trans. Biomed.Eng., 54:1940–1950, 2007.

[23] D. J. Farris, B. D. Robertson, and G. S. Sawicki. Elastic ankle exoskeletons reduce soleus muscle forcebut not work in human hopping. J. Appl. Physiol., 115:579–585, 2013.

[24] W. Felt, J. C. Selinger, J. M. Donelan, and C. D. Remy. Body-in-the-loop: optimizing device parametersusing measures of instantaneous energetic cost. PLoS ONE, 10(e0135342), 2015.

[25] S. Feng, X. Xinjilefu, W. Huang, and C. G. Atkeson. 3D walking based on online optimization. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013.

[26] Siyuan Feng, Eric Whitman, X Xinjilefu, and Christopher G. Atkeson. Optimization Based Full BodyControl for the Atlas Robot. In Proc IEEE Conf Humanoids, Madrid, Spain, 2014.

[27] Siyuan Feng, Eric Whitman, X Xinjilefu, and Christopher G. Atkeson. Optimization-based full bodycontrol for the DARPA Robotics Challenge. Journal of Field Robotics, 32(2):293–312, 2015.

[28] Siyuan Feng, X Xinjilefu, Christopher G. Atkeson, and Joohyung Kim. Optimization based controllerdesign and implementation for the Atlas robot in the DARPA Robotics Challenge Finals. In IEEE-RASInternational Conference on Humanoid Robots (Humanoids), 2015.

[29] D. Fudenberg. The Theory of Learning in Games. MIT Press, 1998.

[30] K. E. Gordon and D. P. Ferris. Learning to walk with a robotic ankle exoskeleton. J. Biomech., 40:2636–2644, 2007.

[31] M. L. Handford and M. Srinivasan. Robotic lower limb prosthesis design through simultaneous computeroptimizations of human and prosthesis costs. Sci. Rep., 6, 2016.

[32] Nikolaus Hansen. The CMA evolution strategy: a comparing review. In J.A. Lozano, P. Larranaga,I. Inza, and E. Bengoetxea, editors, Towards a new evolutionary computation, volume 192, pages 75–102.Springer, 2006.

[33] H. M. Herr and A. M. Grabowski. Bionic anklefoot prosthesis normalizes walking gait for persons withleg amputation. Proc. Roy. Soc. B, 279:457–464, 2012.

[34] J Hodgins. mocap.cs.cmu.edu.

[35] Weiwei Huang, Junggon Kim, and Christopher G. Atkeson. Energy-based optimal step planning for hu-manoids. In Robotics and Automation (ICRA), 2013 IEEE International Conference on, pages 3124–3129,2013.

[36] A. J. Ijspeert. Biorobotics: Using robots to emulate and investigate agile locomotion. Science, 346:196–203, 2014.

[37] S. H. Collins J. Zhang, C. C. Cheah. Torque control in legged locomotion. In Bio-Inspired LeggedLocomotion: Concepts, Control and Implementation. Elsevier, 2016.

[38] R. W. Jackson and S. H. Collins. An experimental comparison of the relative benefits of work and torqueassistance in ankle exoskeletons. J. Appl. Physiol., 119:541–557, 2015.

[39] R. W. Jackson, C. L. Dembia, S. L. Delp, and S. H. Collins. Muscle-tendon mechanics explain unexpectedeffects of exoskeleton assistance on metabolic rate during walking. J. Exp. Biol., page in review, 2017.

[40] H. Kazerooni, R. Steger, and L. Huang. Hybrid control of the berkeley lower extremity exoskeleton (bleex).Int. J. Rob. Res., 25:561–573, 2006.

[41] J. Kim, N. S. Pollard, and C. G. Atkeson. Quadratic encoding of optimized humanoid walking. In IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013.

[42] M. Kim, T. Chen, T. Chen, and S. H. Collins. An ankle-foot prosthesis emulator with control of plan-tarflexion and inversion-eversion torque. IEEE Transactions on Robotics, 2017. in press.

[43] M. Kim and S. H. Collins. Once-per-step control of ankle-foot prosthesis push-off work reduces effortassociated with balance during human walking. Journal of NeuroEngineering and Rehabilitation, 12,2015.

[44] M. Kim and S. H. Collins. Once-per-step control of ankle push-off work improves balance in a three-dimensional simulation of bipedal walking. IEEE Transactions on Robotics, 2017. in press.

[45] S. Kim, C. G. Atkeson, and S. Park. Perturbation-dependent selection of postural feedback gain and itsscaling. Journal of Biomechanics, 45(8):1379–1386, 2012.

[46] J. R. Koller, D. H. Gates, D. P. Ferris, and C. D. Remy. Body-in-the-loop optimization of assistive roboticdevices: a validation study. In Robotics: Science and Systems Conference (RSS), 2016.

[47] T. Kwon and J. K. Hodgins. Control systems for human running using an inverted pendulum modeland a reference motion capture sequence. In Proceedings of the 2010 ACM SIGGRAPH/EurographicsSymposium on Computer Animation, SCA ’10, 2010.

[48] Campbell Muscle Lab. Computer modeling. http://www.campbellmusclelab.org/research-1/computer-modeling, 2017.

[49] G. Lan and S. X. Sun. Dynamics of myosin-driven skeletal muscle contraction: I. steady-state forcegeneration. Biophysical Journal, 88:4107–4117, 2005.

[50] S. Lee, S. Crea, P. Malcolm, I. Galiana, A. Asbeck, and C. Walsh. Controlling negative and positive powerat the ankle with a soft exosuit. In IEEE/RAS International Conference on Robotics and Automation(ICRA), pages 3509–3515, 2016.

[51] C. Liu, C. G. Atkeson, and J. Su. Biped walking control using a trajectory library. ROBOTICA, 31:311–322, 2013.

[52] Chenggang Liu, Christopher G Atkeson, Siyuan Feng, and X Xinjilefu. Full-body motion planning andcontrol for the car egress task of the DARPA Robotics Challenge. In IEEE-RAS International Conferenceon Humanoid Robots (Humanoids), 2015.

[53] P. Malcolm, W. Derave, S. Galle, and D. De Clercq. A simple exoskeleton that assists plantarflexion canreduce the metabolic cost of human walking. PLoS ONE, 8(e56137), 2013.

[54] P. Malcolm, R. E. Quesada, J. M J. M. Caputo, and S. H. Collins. The influence of push-off timing in arobotic ankle-foot prosthesis on the energetics and mechanics of walking. Journal of NeuroEngineeringand Rehabilitation, 12, 2015.

[55] A. Mansson. Actomyosin-ADP states, interhead cooperativity, and the force-velocity relation of skeletalmuscle. Biophysical Journal, 98:1237–1246, 2010.

[56] L. Marcucci and C. Reggiani. Mechanosensing in myosin filament solves a 60 years old conflict in skeletalmuscle modeling between high power output and slow rise in tension. Frontiers in Physiology, 7(427),2016.

[57] L. Marcucci, T. Washio, and T. Yanagida. Including thermal fluctuations in actomyosin stable statesincreases the predicted force per motor and macroscopic efficiency in muscle modelling. PLoS ComputBiol, 12(9), 2016.

[58] M. Mistry, A. Murai, K. Yamane, and J. Hodgins. Sit-to-stand task on a humanoid robot from humandemonstration. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on,pages 218–223, 2010.

[59] L. M. Mooney, E. J. Rouse, and H. M. Herr. Autonomous exoskeleton reduces metabolic cost of humanwalking. J. NeuroEng. Rehabil., 11(151), 2014.

[60] K. Nishikawa. Eccentric contraction: unraveling mechanisms of force enhancement and energy conserva-tion. Journal of Experimental Biology, 219:189–196, 2016.

[61] Gerald Offer1 and K. W. Ranatunga. Reinterpretation of the tension response of muscle to stretches andreleases. Biophysical Journal, 111:2000–2010, 2016.

[62] R. E. Quesada, J. M. Caputo, and S. H. Collins. Increasing ankle push-off work with a powered prosthesisdoes not necessarily reduce metabolic rate for transtibial amputees. Journal of Biomechanics, 49:3452–3459, 2016.

[63] T. J. Roberts and R. A. Belliveau. Sources of mechanical power for uphill running in humans. Journal ofExperimental Biology, 208:1963–1970, 2005.

[64] B. D. Robertson, D. J. Farris, and G. S. Sawicki. More is not always better: modeling the effects of elasticexoskeleton compliance on underlying ankle muscle-tendon dynamics. Bioinspir. Biomim., 9, 2014.

[65] S. Sanan, M. H. Ornstein, and C. G. Atkeson. Physical human interaction for an inflatable manipulator. InIEEE Engineering in Medicine and Biology Society (EMBC), pages 7401–7404, 2011.

[66] Y. Sato, E. Akiyama, and J. D. Farmer. Chaos in learning a simple two-person game. PNAS, 99(7):4748–4751, 2002.

[67] S. Schaal and C. G. Atkeson. Learning control for robotics. IEEE Robotics & Automation Magazine,17(2):20–29, 2010.

[68] J. C. Selinger and J. M. Donelan. Estimating instantaneous energetic cost during non-steady-state gait. J.Appl. Physiol., 117:1406–1415, 2014.

[69] J. C. Selinger, S. M. OConnor, J. D. Wong, and J. M. Donelan. Humans can continuously optimizeenergetic cost during walking. Curr. Biol., 25:2452–2456, 2015.

[70] P. R. Shorten, P. OCallaghan, J. B. Davidson, and T. K. Soboleva. A mathematical model of fatigue inskeletal muscle force contraction. J Muscle Res Cell Motil, 28:293–313, 2011.

[71] Kwang Won Sok, Katsu Yamane, Jehee Lee, and Jessica Hodgins. Editing dynamic human motionsvia momentum and force. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium onComputer Animation, SCA ’10, pages 11–20, 2010.

[72] M. Srinivasan and A. Ruina. Computer optimization of a minimal biped model discovers walking andrunning. Nature, 439:72–75, 2006.

[73] B. Stephens. Dynamic balance force control for compliant humanoid robots. In International Conferenceon Intelligent Robots and Systems (IROS), pages 1248–1255, 2010.

[74] B. Stephens. Push Recovery Control for Force-Controlled Humanoid Robots. PhD thesis, Carnegie MellonUniversity, 2011.

[75] Benjamin J. Stephens and Christopher G. Atkeson. Push Recovery by stepping for humanoid robots withforce controlled joints. 2010 10th IEEE-RAS Int. Conf. Humanoid Robot., pages 52–59, 2010.

[76] Martin Stolle and Christopher G. Atkeson. Finding and transferring policies using stored behaviors. Au-tonomous Robots, 29(2):169—200, 2010.

[77] T. K. Uchida, A. Seth, S. Pouya, C. L. Dembia, J. L. Hicks, and S. L. Delp. Simulating ideal assistivedevices to reduce the metabolic cost of running. PLoS ONE, 11(e0163417), 2016.

[78] B. R. Umberger. Stance and swing phase costs in human walking. J. Roy. Soc. Int., 7:1329–1340, 2010.

[79] B. R. Umberger, K. G. M. Gerritsent, and P. E. Martin. A model of human muscle energy expenditure.Comput. Methods Biomech. Biomed. Eng., 6(2):99–111, 2003.

[80] A. J. van den Bogert. Exotendons for assistance of human locomotion. Biomed. Eng. Online, 2(1), 2003.

[81] W. van Dijk and H. van der Kooij. Xped2: A passive exoskeleton with artificial tendons. Rob. Autom.Mag., 21:56–61, 2014.

[82] S. R. Ward, C. M. Eng, L. H. Smallwood, and R. L. Lieber. Are current measurements of lower extremitymuscle architecture accurate? Clin. Orthop. Relat. Res., 467:1074–1082, 2009.

[83] E. Whitman and C. G. Atkeson. Control of instantaneously coupled systems applied to humanoid walking.In International Conference on Humanoid Robots, pages 210–217, 2010.

[84] E. C. Whitman and C. G. Atkeson. Multiple model robust dynamic programming. In American ControlConference (ACC), pages 5998–6004, 2012.

[85] E. C. Whitman, B. J. Stephens, and C. G. Atkeson. Torso rotation for push recovery using a simple changeof variables. In International Conference on Humanoid Robots, pages 50–56, 2012.

[86] Eric Whitman. Coordination of Multiple Dynamic Programming Policies for Control of Bipedal Walking.PhD thesis, Robotics Institute, Carnegie Mellon University, 2013.

[87] K. A. Witte, J. Zhang, R. W. Jackson, and S. H. Collins. Design of two lightweight, high-bandwidthtorque-controlled ankle exoskeletons. In IEEE/RAS International Conference on Robotics and Automation(ICRA), pages 1223–1228, 2015.

[88] D. Xing, C. G. Atkeson, J. Su, and B. Stephens. Gain scheduled control of perturbed standing balance. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4063–4068, 2010.

[89] Xinjilefu. State Estimation for Humanoid Robots. PhD thesis, Robotics Institute, Carnegie Mellon Uni-versity, 2015.

[90] Xinjilefu and C.G. Atkeson. State estimation of a walking humanoid robot. In Intelligent Robots andSystems (IROS), 2012 IEEE/RSJ International Conference on, pages 3693–3699, 2012.

[91] X Xinjilefu, Siyuan Feng, and Christopher G. Atkeson. Center of mass estimator for humanoids and itsapplication in modelling error compensation, fall detection and prevention. In IEEE-RAS InternationalConference on Humanoid Robots (Humanoids), 2015.

[92] K. Yamane, S.O. Anderson, and J.K. Hodgins. Controlling humanoid robots with human motion data: Ex-perimental validation. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conferenceon, pages 504–510, 2010.

[93] K. Yamane and J. Hodgins. Control-aware mapping of human motion data with stepping for humanoidrobots. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages726–733, 2010.

[94] K. E. Zelik, S. H. Collins, P. G. Adamczyk, A. D. Segal, G. K. Klute, D. C. Morgenroth, M. E. Hahn,M. S. Orendurff, J. M. Czerniecki, and A. D. Kuo. Systematic variation of prosthetic foot spring affectscenter-of-mass mechanics and metabolic cost during walking. Trans. Neur. Sys. Rehab. Eng., 19:411–419,2011.

[95] M. Zucker, J. A. Bagnell, C. G. Atkeson, and J. Kuffner. An optimization approach to rough terrainlocomotion. In IEEE International Conference on Robotics and Automation (ICRA), pages 3589–3595,2010.

[96] M. Zucker, N. D. Ratliff, M. Stolle, J. E. Chestnutt, J. A. Bagnell, C. G. Atkeson, and J. Kuffner. Op-timization and learning for rough terrain legged locomotion. International Journal of Robotic Research,30(2):175–191, 2011.