Shared control–based bimanual robot manipulation

See allHide authors and affiliations

Science Robotics  29 May 2019:
Vol. 4, Issue 30, eaaw0955
DOI: 10.1126/scirobotics.aaw0955


Human-centered environments provide affordances for and require the use of two-handed, or bimanual, manipulations. Robots designed to function in, and physically interact with, these environments have not been able to meet these requirements because standard bimanual control approaches have not accommodated the diverse, dynamic, and intricate coordinations between two arms to complete bimanual tasks. In this work, we enabled robots to more effectively perform bimanual tasks by introducing a bimanual shared-control method. The control method moves the robot’s arms to mimic the operator’s arm movements but provides on-the-fly assistance to help the user complete tasks more easily. Our method used a bimanual action vocabulary, constructed by analyzing how people perform two-hand manipulations, as the core abstraction level for reasoning about how to assist in bimanual shared autonomy. The method inferred which individual action from the bimanual action vocabulary was occurring using a sequence-to-sequence recurrent neural network architecture and turned on a corresponding assistance mode, signals introduced into the shared-control loop designed to make the performance of a particular bimanual action easier or more efficient. We demonstrate the effectiveness of our method through two user studies that show that novice users could control a robot to complete a range of complex manipulation tasks more successfully using our method compared to alternative approaches. We discuss the implications of our findings for real-world robot control scenarios.


Human-centered environments are tailored for two-handed, or bimanual, manipulations. Ever since neural structures in the brains of human ancestors started evolving to facilitate interactions between the two hands, societies began adapting their settings around these abilities, affording complex tool use, manual labor, meal preparation, and communicative gestures (13). These societal and evolutionary underpinnings of bimanual processes are evident in day-to-day environments and activities, such as when securing a jar with one hand while twisting its lid with the other to open the jar, when securing a bowl in place with one hand while stirring with the other, when lifting a laundry basket with both hands holding the handles on either side, and when passing plates from one hand to another when setting the table for dinner. Whereas human-centered environments afford, and often require, bimanual manipulations, robots that are designed to function in, and physically interact with, these environments have not been able to meet these requirements.

How do robots currently approach bimanual tasks?

Robot platforms have historically been designed with single-arm abilities. Consequently, enabling such robots to perform bimanual tasks requires modifying either the task or the environment. For example, if a robot was tasked with unscrewing the cap off of a water bottle, then the bottle would need to be secured to a table ahead of time for the robot to execute the task with a single arm. This single-arm limitation has been shown to make robot actions more difficult to interpret and relate to (4, 5) and poses a substantial barrier to realizing the full potential of robots for functioning in human environments and assisting people in day-to-day tasks (6, 7).

When, alternatively, two robot arms are used in current manipulation and control methods, the bimanual problem is generally reduced to concurrent instances of single-arm approaches or single-function mechanisms [for a review, see Smith et al. (8). We also provided a Related Works section in Supplementary Materials]. Whether it be for multi-arm grasping (9); multi-arm motion planning (1012); kinematic (1316), impedance (1719), or hybrid-mode (2022) controllers; teleoperation interfaces (2326); or active vision (2729), two robot arms in current methods either exhibit independent behavior with only limited coordination, such as collision avoidance, or show only single-function bimanual abilities, such as only being able to grasp and stabilize an object with two hands, without accommodating other bimanual skills, such as passing objects from one hand to the other.

How do people approach bimanual tasks?

Whereas robots have not been able to realize the full breadth of bimanual manipulations in human-centered environments, people instinctively perform many such manipulations in day-to-day life. Understanding the differences between current bimanual robot control approaches and the way that the human brain considers two-handed manipulations might reveal factors that contribute to the bimanual manipulation ability gap between people and robots.

Much previous work in neuroscience, neurophysiology, and rehabilitation suggests that current robot control and manipulation paradigms, i.e., considering two-handed control as concurrent instances of single-arm approaches or as single-function mechanisms, do not reflect how the brain considers bimanual manipulations. For example, studies have indicated that the brain does not command bimanual manipulations by simply superimposing two independent single-arm representations (30, 31). Instead, dedicated regions of the brain, such as the supplementary motor area (32) and primary motor cortex (33), exhibit unique neural patterns specific to bimanual manipulations (34). This effect is illustrated in a study by Ifft et al. (34) where the authors successfully controlled the bimanual arm motions of rhesus monkeys using a brain-machine interface by targeting the areas of the brain specific to bimanual movements as opposed to separately targeting the brain regions associated with right- and left-arm unimanual movements. A leading theory for describing the cognitive bases for bimanual actions, called internal model theory, posits that the brain maintains a centralized symbolic representation for bimanual movements arbitrated by specialized brain regions (35).

Previous work has also shown the remarkable dynamism and flexibility of brain activity when performing bimanual tasks (36, 37). This dynamic nature of brain networks during bimanual activity facilitates switching functions to accommodate various environmental constraints, task difficulty levels, and spatiotemporal relationships between the two arms (3). All leading theories for modeling bimanual behavior—including dynamical systems theory (38, 39), muscle synergy theory (40), internal model theory (35), and optimal feedback control theory (41)—agree that the nature of coordination between the arms during bimanual movements dynamically changes depending on current task constraints (42).

Swinnen and Wenderoth (3) combined the concepts of a centralized action semantics in the brain during bimanual movements and the dynamic function switching dependent on the bimanual task to describe a “gestalt” phenomenon where the individual motions of each arm are promoted to achieve more than the sum of their parts. This body of work suggested that any bimanual motion planning, control, or manipulation method that does not consider the centralized action semantics or dynamic function switching involved in bimanual actions, such as the current methods outlined above, will fail to achieve this proposed gestalt effect and be limited in applicability and scope.

Our solution

The goal of the current work is to extend the abilities of robots to effectively function in human-centered environments by achieving the gestalt effect involved in human bimanual manipulations. To illustrate, consider a bimanual robot platform that is installed in a home environment to provide assistance for an older adult. The robot would need to perform a wide variety of bimanual tasks in this scenario, such as opening pill bottles, carrying a laundry basket, or stirring a meal while keeping the pan stable on the stove. Each of these tasks is composed of various bimanual actions—individual types of two-handed movements that would benefit from a particular control strategy. Our goal was to capture these diverse, dynamic, and intricate actions and interactions between the hands that commonly occur throughout bimanual manipulations in such tasks and to enable control mechanisms that support these interactions. For example, when carrying a laundry basket, the correspondence between the two hands, specifically the fixed translation and rotation offset of the hands as dictated by the laundry basket between them, is more critical than the individual motions of each independent hand. Thus, a central premise throughout our work is that a successful bimanual control method will leverage the higher-level actions and interactions between the two hands, which often take precedence over the independent behavior of each hand.

To develop such control for robots that considers the higher-level actions and interactions between the hands, we argue that three technical challenges must be addressed:

1) How should the robot organize the wide range of bimanual manipulations in a manner that allows us to provide mechanisms to support them?

2) How should the robot identify which bimanual coordination is needed so that it can apply the appropriate control strategy?

3) What control strategy should the robot implement for each bimanual coordination type?

Our current work explored bimanual manipulation in a real-time control scenario. Specifically, we formulated the problem as a bimanual shared-control method, i.e., a control method that aimed to reduce the tedium or difficulty of direct control by enabling the robot to handle some aspects of the control process (43). Using such a method, the robot was able to arbitrate between a user’s command inputs and its own underlying motion policies and understanding of bimanual tasks. Our methods identified which action the user was performing and adapted the control algorithm to provide assistance in executing the action by adapting the robot’s movements. For example, in the laundry basket carrying example provided above, even if the user did not exactly maintain the fixed offset task constraint while specifying their motion inputs, the robot could use its understanding of the underlying bimanual coordination to maintain a fixed offset between its end effectors such that it does not drop or break the laundry basket. An overview of our work can be seen illustrated in Fig. 1. Although our current work frames our proposed approach in real-time control, we expect our core ideas to apply more generally, for example, to a system in which the inputs come from an autonomous robot’s perception and planning algorithms instead of a human operator.

Fig. 1 A shared-control method for effective bimanual robot manipulation.

(A) We constructed a motion dataset of people performing two-handed tasks and (B) extracted high-level kinematic patterns from the data to build a compact and lightweight bimanual action vocabulary that sufficiently spans the space of two-handed actions. (C) While the user is controlling the robot, (D) the method infers which action from the bimanual action vocabulary is most likely being specified by the user and (E) engages an appropriate assistance mode (F) to help during the respective bimanual action.

How should the robot organize the range of bimanual manipulations?

To address the first technical challenge, we explored the idea that there exists a small set of bimanual action classes that can abstractly represent a wide range of possible bimanual manipulations. Supported by the theories discussed above, particularly the theory that the brain maintains a centralized action semantics in specialized brain regions (3, 35), our proposed method represented bimanual actions as labeled elements in a bimanual action vocabulary that spanned possible forms of coordination that the two arms may try to achieve in a task.

Given the premise that a concise set of actions and interactions between the hands will often take precedence over the independent actions of each hand during bimanual manipulations, as supported by previous work above, a natural question arises: What are these actions and interactions between the two hands? Previous work has successfully isolated a set of control semantics in the brains of rhesus monkeys using surgically implanted electrodes to activate their bimanual actions (34). We argue that if the brain has a representation of central bimanual actions that it uses to organize the interactions, constraints, cooperative movements, and asymmetric movements between the user’s hands, a theory supported by much previous work (3, 35, 42, 40), then we should be able to isolate, recognize, and label such features as distinct patterns in the user’s hand motions when performing bimanual tasks. To explore this premise, we first conducted a formative study in which we recorded and analyzed the hand motions of human participants through various bimanual tasks. Through a kinematic pattern analysis, we distilled this space down to a bimanual action vocabulary designed to characterize the bimanual manipulation space in a flexible and comprehensible manner. An evaluation of the vocabulary shows that it could serve as an effective abstraction level to specify to the robot how bimanual tasks should be accomplished.

How should the robot identify which action is needed?

Our solution to the second technical challenge involved classifying which bimanual action from the bimanual action vocabulary is most likely being specified by the user at a given time. The method observed the recent stream of the user’s motion inputs as a state model and used a sequence-to-sequence recurrent neural network to infer the most probable bimanual action being specified. This solution is analogous to how the brain considers bimanual neural processes, according to the internal model theory of bimanual action specification (35). The method recognized from a centralized action semantics how the two hands are likely to be coordinating and was subsequently able to modulate the control processes based on the currently inferred task constraints.

What control strategy should the robot implement for each bimanual action?

Our solution to the third technical challenge involved a control approach that enabled the robot to move its arms in a coordinated fashion following three phases. First, it captured the poses (position and rotation) of the user’s hands at each update so that it could map the user’s hand motions onto the robot’s end effectors in real time. This step, called motion retargeting, involved mapping motion from one articulated figure (e.g., a teleoperator, motion-capture actor, etc.) to a potentially vastly dissimilar articulated figure (e.g., a robot, animated character, etc.), such that important motion or pose properties were maintained (44, 45). This process allowed users to specify what actions they want the robot to perform by simply and naturally providing desired motions with their own arms, an approach that we have termed mimicry control in previous work (46, 47). Second, based on the recent stream of the user’s hand poses, the method used our solution to the second technical challenge in the control loop to infer which bimanual action from the bimanual action vocabulary that the user is most likely trying to specify to the robot using a sequence-to-sequence recurrent neural network. Third, given the inferred bimanual action at the current time, the method dynamically engaged an appropriate assistance mode, signals introduced into the shared-control loop designed to make the performance of a particular bimanual action easier or more efficient. Thus, although the users felt as if they had direct control over both of the robot’s arms, the robot subtly overrode their direct inputs to meet task constraints between the two hands given the currently inferred bimanual action. Two examples of assistance modes in our method (outlined in more detail below) are constraining the two hands to maintain a fixed translation and rotation offset when lifting a rigid object with two hands and ensuring that the robot’s end effectors are at the correct distance from each other when performing a self-handover.

To establish the effectiveness of our proposed solutions, we have evaluated our bimanual shared-control method through two laboratory studies involving naïve human participants. The first study compared our method against alternative control approaches, including an unassisted bimanual approach, as well as a state-of-the-art single-arm control interface. The second study provides details on the relative contributions of each assistance mode used in our method. The results of these studies provide insight into the potential impact of introducing bimanual robot operation on real-world robot control scenarios and other related human-robot collaboration domains, including fully autonomous bimanual manipulations and bimanual robot teaching.

The contributions of our overall work include (i) introducing an appropriate abstraction level for how people instinctively perform tasks with two hands, (ii) enabling robots to use this abstraction to interpret and reason about bimanual manipulations in real-world environments, and (iii) making robot control in real-world, human-centered environments easier and more effective, even for novice users, by providing users with the ability to control the robot analogously to how they would naturally perform tasks with two hands themselves.


This section presents the technical solutions to the three challenges outlined above and the findings from their evaluation. Further technical, implementation, and experimental details are provided in the Supplementary Materials.

Solution to challenge 1: Bimanual action vocabulary

A central premise throughout our work is that a successful bimanual shared-control method will tune the control behavior based on the higher-level actions and interactions between the two hands. To realize this robot behavior, we must first discover and define these actions and interactions between the two hands. Our strategy to discover these bimanual actions and interactions in this work is as follows: (i) record a dataset consisting of participants’ hand poses over time while executing various bimanual tasks, (ii) filter the dataset such that data signals are not conflated with extraneous noise, (iii) analyze the dataset to assess what high-level patterns emerge when people perform bimanual tasks, and (iv) organize the observed bimanual patterns into a bimanual action vocabulary, such that the elements in the action vocabulary are robust, interpretable, and cover a wide breadth of common bimanual manipulations. In this section, we describe the kinematic pattern analysis we conducted to distill the space of bimanual manipulations down to a bimanual action vocabulary.

Formative study

To form a bimanual action vocabulary, we first assessed how people perform bimanual tasks. We ran a participant study where we collected a bimanual manipulation dataset by recording the six–degree of freedom (DOF) translation and rotation pose information of each participant’s hands while they perform a set of two-handed tasks. Details about how we collected and filtered our dataset can be found in the Supplementary Materials. Our whole dataset is structured asD{(p1d,p1n,q1d,q1n),(p2d,p2n,q2d,q2n),,(pTd,pTn,qTd,qTn)}

Here, ptd, qtd, ptn, and qtn are the positions and orientations of the dominant and nondominant hand at time step t, respectively, and T denotes the final time step.

Feature construction

Given the collected and filtered dataset, our goal is to use the data to categorize actions and interactions that occur between peoples’ hands during bimanual manipulations. We transformed the data to encode each pose in a manner that allowed for analysis of the interactions between people’s hands. For each pair of six-DOF hand configurations, we computed six different scalar-valued features that measured various relationships between the hands, independent of their absolute position. The values used only relative coordinates, such that models learned through our analyses were not tied to the absolute coordinate frame of the motion capture setup. The six scalar values are (i) the distance between the hands (denoted as hand offset), (ii) the rate of change of the distance between the hands (denoted as hand offset velocity), (iii) the dominant hand’s translational velocity, (iv) the dominant hand’s rotational velocity, (v) the nondominant hand’s translational velocity, and (vi) the nondominant hand’s rotational velocity. A feature at a single time point, st is structured asst=[ptdptnptdptnpt1dpt1nptdpt1ddisp(qtd,qt1d)ptnpt1ndisp(qtn,qt1n)](1)

Here, velocities are approximated using backwards finite differencing, and disp is the standard displacement operator for quaternions: disp(q1, q2) = log(q11 * q2) (48). Although we show that these six features are sufficient for representing interactions between the two hands in our kinematic pattern analysis, we note that this set of features is only one of many possible sets. For example, one could supplement this set of features with force and moment information to also characterize the dynamics of bimanual manipulation in an action vocabulary.

Because the features st only encode a single, discrete hand pose event, we also window many such events together in a long concatenated vector to encode motion over timeft=[stω/2stω/2+1stst+ω/21st+ω/2](2)

In our analyses, a single feature encodes a second of motion (ω = 80).

Principal components analysis

Given the feature vectors ft, our goal is to analyze these vectors such that high-level patterns characterizing the actions and interactions between the hands emerge. Our conjecture is that, although our feature-vector space has many high-dimensional data points, there is a set of just a few component vectors that can effectively span this space. If such a small set of components exists, then this would indicate that only a few central kinematic actions arise when people do bimanual tasks.

To search for such a set of bimanual actions, we used principal components analysis (PCA), a common statistical technique for doing dimensionality reduction (49). The scree plot at the top of Fig. 2 indicates the explained variance with respect to each principal component. We observed that most of the variance in the bimanual dataset could be explained with just the first seven principal components before hitting an inflection point on the scree plot and leveling out. Figure 2B illustrates the resulting principal components from the PCA to give a sense of what the dimensions of the bimanual action-vocabulary space represent. In the following section, we overview how we used these principal components in our subsequent manual analysis.

Fig. 2 Kinematic pattern analyses.

(A) Scree plot from our kinematic pattern analysis using PCA. The inflection point indicates that seven principal components cover much of the variance in the bimanual action space. (B) First seven principal components (displayed as colored lines in the graphs) shown over 80 time steps for each of our six kinematic features (hand offset, hand offset velocity, etc.). (C) Illustration of how the third principal component [the red lines in (B)] connects to the self-handover bimanual action. The dotted lines point to particular landmarks over the different kinematic features that characterize the self-handover action. (D) Seven principal components are grouped into four high-level “words” in our bimanual action vocabulary: fixed offset, one hand fixed, self-handover, and one hand seeking.

Bimanual actions from analysis

In this section, we attach semantic meaning to each of our principal components using a manual, post hoc analysis to construct a bimanual action vocabulary. We manually organized and labeled the top PCA components to clarify what these components actually mean in the context of a bimanual manipulation task. Our analysis first involved finding points within our motion dataset that correspond well with the particular principal components. For example, when assessing the high-level semantic action corresponding to principal component 1 (the blue curve in Fig. 2B), we found points close to represented as [a,0,0,0,0,0,0]T using the principal components as a basis for some real value a. Then, we reviewed the study videos to interpret and further organize how the participants’ hands were interacting with one another at these representative points. This type of analysis would be infeasible to do in an automated way due to the highly contextualized nature of the assessment. The resulting bimanual actions from our analysis are explained here and can be seen summarized in Fig. 2:

1) Fixed offset. Principal components 1, 6, and 7 all correspond to a similar kinematic pattern where the offset between the hands does not exhibit much change, and the translational and rotational velocities of both of the hands follow similar characteristics. Upon further investigation, we found that these motions happened when participants were holding an object with two hands, such that the offset of the hands was dictated by the object being moved. The hands had to follow similar translational and rotational velocity profiles because the rigid object between the hands constrained each hand from exhibiting its own independent translation or rotation.

2) One hand fixed. Principal component 2 corresponds to the two hands working in close proximity, where one hand is relatively stationary while the other hand maintains a high translational and/or rotational velocity. We found that these motions occurred when one hand was holding an object in place while the other hand did some manipulation with respect to this object. An example of such an action is holding a bowl in place with one hand while stirring in the bowl with the other hand.

3) Self-handover. Principal component 3 corresponds to the hands coming together quickly before easing at the end of the action. Principal component 5 consists of the hands starting together and moving apart at a moderate speed. Upon further investigation, we found that these two components occur sequentially when participants are initiating and completing a self-handover, respectively.

4) One hand seeking. Principal component 4 corresponds to the hands moving apart at a fast rate, with one hand maintaining a high translational and rotational velocity, while the other hand remains relatively still. When investigating further, we observed that this action occurred when participants were reaching for an object in the workspace. We note that the top principal components here do not exhibit any motions where the hands move apart, while both hands maintain a high velocity. We believe that this indicates that people rarely reach for separate objects simultaneously with both hands, instead only reaching with one hand at a time. Thus, our bimanual action here is one hand seeking an object.

Solution to challenge 2: Bimanual action inference

To appropriately provide support mechanisms in the bimanual controller for the bimanual actions specified above, the method must have some way to discern which bimanual action is currently happening. To do this, we introduced a bimanual action inference method using a sequence-to-sequence recurrent neural network architecture. We chose to use a recurrent neural network architecture in this scenario because of its reported successes in inferring events in time-series data in previous work (50, 51). Specifically, we took advantage of the temporal structure in this data stream by using a recurrent neural network using a single long short-term memory layer (52). For full implementation and analysis details on our bimanual action inference approach, refer to the Supplementary Materials.

Solution to challenge 3: Shared control–based bimanual telemanipulation

Our next goal was to use the understanding of how people perform bimanual tasks to afford an effective shared-control method for bimanual manipulators. A high-level overview of our shared-control method is provided in Introduction.

Real-time motion retargeting using RelaxedIK

We remapped the motions of the user’s hands onto the robot’s hands on the fly through a process called motion retargeting. At each update, our shared-control method calculated joint angles during the retargeting step through a process called inverse kinematics (IK) [see Aristidou et al. (53) for a review of common IK methods]. Previous work shows that using a standard IK solver for human-to-robot motion retargeting is not effective because the resulting sequence of joint-angle solutions exhibit infeasible motion qualities, such as self-collisions, kinematic singularities, and joint-space discontinuities (46).

To address these problems, we used an optimization-based IK solver, called RelaxedIK, that is able to handle trade-offs between many objectives on the fly (54). The key insight in this method is that exactly matching the end-effector pose goals does not have to be a hard constraint, and instead, other goals—such as smooth joint motion, kinematic singularity avoidance, or self-collision avoidance—could be more important in certain situations. Certain subgoals are considered less important on the fly and are automatically “relaxed” or de-emphasized. RelaxedIK has been shown to be successful for human-to-robot motion retargeting in previous work (28, 46, 47). An abbreviated description of the problem formulation, structure, and notation behind RelaxedIK can be found in the Supplementary Materials. Our current implementation uses RelaxedIK to perform the on-the-fly motion retargeting, but other motion optimization frameworks that handle multiple kinematic chains or humanoid structures, such as the Stack-of-Tasks (55), may also successfully support our presented methods.

Bimanual assistance modes

In this section, we overviewed how we assist in our shared-control method for each action in our bimanual action vocabulary. All mathematical details for each assistance mode can be found in the Supplementary Materials.

1) Fixed offset. The fixed offset bimanual action occurred when a user had picked up and moved an object with both hands. In this situation, the translation and rotation of the user’s hands are constrained to the object being manipulated. The high-level idea behind the fixed offset assistance mode is to keep the robot’s end effectors at the same distance with the same relative translation and rotation offset throughout the whole bimanual action such that the rigid object is successfully moved with the cooperating hands. Without this fixed spatiotemporal offset between the hands, it would be difficult for users to provide sufficient independent inputs from both hands that meet the task constraints. To provide assistance throughout this mode, we approximated a coordinate frame pose of the object being manipulated throughout the fixed offset action and then added objectives of high importance to both of the robot’s end effectors to maintain the poses of the robot’s hands with respect to the proxy object frame.

2) One hand fixed. The one hand–fixed action typically occurred when one hand was holding an item in place such that the other hand could finely perform manipulations with respect to the static object. To assist during this action, we encouraged the hand detected to be stationary in the manipulation to remain fixed. For example, if the robot was pouring a liquid into a cup, then the end effector holding the cup would remain static; even if the user exhibited small motion perturbations in the input signal, the robot would ignore this noisy behavior in favor of keeping the hand fixed, making the pouring action by the other hand easier to execute.

3) Self-handover. A self-handover action occurred when an object was passed from one hand to the other. To assist with this action, we made two adjustments to the control algorithm. Our first adjustment was to gradually decelerate the robot’s end effectors as they came together when the action was first detected. This assistance is designed to mimic the velocity profile observed when people do self-handovers, as highlighted in our kinematic pattern analysis.

Our second adjustment was to ensure that the robot’s end effectors were close together when the user’s hands were close together. Without this assistance, the user would need to cross their arms or keep their arms far apart during a handover if the robot’s arms have a different scale and geometry from the operator’s arms. To correct this, we shifted more importance in the control algorithm to the absolute distance between the robot’s end effectors when the user’s hands were close together. Thus, a self-handover action being specified by the user maintained a strong motion correspondence with the robot when it executed the self-handover action.

4) One hand seeking. The one hand seeking action occurred when one hand was reaching out for an object, while the other hand was not active in the manipulation. We assisted during this action by placing more relative importance on matching the position and rotation end-effector goals on the seeking hand, meaning that small position and rotation errors were considered more allowable on the non-seeking hand.

Because RelaxedIK is an optimization-based IK solver that can make trade-offs between many objectives, the goal of this assistance is to provide the solver with a sense of importance of one hand versus the other. Through this process, more effective joint configurations could be solved at each update that exhibited better matching of the hand that was important as opposed to matching the hand that was not contributing to the task.

Evaluation of the proposed approach

In this section, we outlined the results of our two user studies conducted to evaluate our method against alternative approaches for telemanipulation. All details about the design of the two studies, including the hypotheses tested, tasks, procedure, measures, and data analysis methods, can be found in Materials and Methods.

In study 1 (effects of bimanual assistance), we evaluated the performance of our bimanual shared-control method against alternative approaches on a set of complex manipulation tasks. We present results based on a repeated-measures ANOVA (analysis of variance). All pairwise comparisons used Tukey’s HSD test to control for type I error in multiple comparisons. Results can be seen summarized in Fig. 3.

Fig. 3 Tukey boxplots overlaid on data points from objective and subjective measures, displaying results from study 1.

Error bars designate the SEM. This study used a sample size of 24 participants (n = 24), and P values are indicated by the asterisk (*) and dagger (†) symbols.

In study 2 (relative contributions of our assistance modes), we evaluated the relative contribution of each of our bimanual assistance modes. We present results based on a repeated-measures ANOVA. All pairwise comparisons used Tukey’s HSD test to control for type I error in multiple comparisons. Results can be seen summarized in Fig. 4.

Fig. 4 Tukey boxplots overlaid on data points from objective and subjective measures, displaying results from study 2.

Error bars designate the SEM. This study used a sample size of 24 participants (N = 24), and P values are indicated by the * and † symbols.


In this work, we extended robot manipulation abilities to include two arms by formalizing a shared-control method designed to afford effective execution of bimanual tasks. Below, we suggest the main takeaways from our work based on results from our two user studies, provide more specific discussion of the results of our studies, and overview limitations and extensions for our work.

Overview of takeaways

On the basis of results from our two user studies, we suggest four main takeaways from our work (below). A more thorough discussion of our study findings can be found in the Supplementary Materials.

1) The benefits observed by a bimanual robot control platform were elicited from reasoning about and leveraging the actions and interactions between the two arms rather than simply having two arms involved in the manipulation. In other words, we observed no benefits in using two robot arms over a single robot arm in our robot control scenario when task intuition and corresponding assistance provided by our bimanual action vocabulary was removed. Thus, we argue that the robot having this intuition about how two arms interact with each other in tasks is paramount when two arms are present.

2) The use of assistance modes rooted in a bimanual action vocabulary provided task success and user perception benefits across a wide range of manipulation tasks.

3) A particular assistance mode may have individual bearing on a particular task when used independently, although they appear to have a gestalt effect, where even more benefits were present when multiple assistance modes were afforded to the user. Thus, the ability to dynamically switch between assistance modes, such as when using our proposed sequence-to-sequence recurrent neural network inference method, is an integral part of a successful bimanual shared-control approach.

4) A bimanual robot control platform is capable of affording task performance and user perception benefits over a single-arm robot platform when used for complex manipulation tasks in human-centered environments.

Limitations and extensions

Our bimanual shared-control method has limitations that suggest many future extensions to our work. First, our method currently only considers the kinematics of the bimanual platform, rather than also considering the dynamics of manipulations. Extensions of our proposed solutions would supplement our motion pattern analysis to include force and torque data and extend our bimanual action vocabulary to include these extra dimensions. Using the new dynamics-infused bimanual action vocabulary, we would in turn extend our assistance modes to compliant control laws that dictate how the arms should work in tandem when manipulating objects. Force information would also be reflected back to the teleoperator using haptics.

In addition, our method only considered motion at the level of the user’s hand poses, although we speculate that much of human-level dexterity when using two hands is at the level of the user’s fingers. Including high-fidelity finger motion and dynamics information at the users’ fingertips would provide rich data to bolster both our bimanual action vocabulary and resulting assistance modes. We also note that the assistance modes in our current method were hand-designed and based on heuristic choices. Extensions of our work may explore ways to automate the mapping of the various control processes, such as identifying control laws directly from human motion and force signals that could generalize to various robot platforms.

A key lesson from our work is that explicitly supporting bimanual actions through dual-arm control algorithms provides value over controlling arms individually. Although our experiments only show the benefits relative to a specific single-arm controller, we believe that the control methods we introduced, as well as the general principles of bimanual action selection, may be used to extend other single-arm control approaches. Similarly, we believe that our action-based control methods may be incorporated to enhance other bimanual control schemes.

Last, our current method did not consider the user’s view of the environment when controlling the robot’s arms. Our current studies only considered cases where the user was colocated with the robot such that the user had a direct line of sight of the workspace; however, in real-world use cases, it would be common for the robot to be deployed in a remote location, making the visibility problem a pertinent consideration. Our recent work in remote telemanipulation stressed the importance of situational awareness when controlling a robot arm remotely (28). This previous work, however, did not consider the remote visibility problem when controlling two arms simultaneously. Methods that autonomously provide an effective view of a remote environment for a user controlling a bimanual robot platform would bolster our current shared-control approach.


Study design

In this section, we describe the design of our two studies, including the hypotheses tested and study design, tasks, procedure, measures, and data analysis methods. Implementation details for the prototype system used throughout our evaluations can be found in the Supplementary Materials. The evaluations conducted in this work were approved by the Institutional Review Board at the Naval Research Laboratory in Washington, DC.


Both of our studies followed a common procedure. After informed consent, participants were provided with information on the goals of the study and were invited to ask any questions they had. The participants first put on a pair of velcro motion-capture gloves, stood in a fixed location next to the robot, and waited in a comfortable initial pose with their palms level to the floor and fingers facing forward. The standing spot was selected to provide a sufficient vantage point for all subtasks and to ensure that participants would be out of the robot’s range of motion at all times for safety. The experimenter guided the participants through a practice phase on how to control the Hubo robot. The system was initialized by the experimenter, and then, the experimenter counted down from five to signal when the participant would have control. Once the system counted down, the participant could move his/her arms and hands in free-space to practice using the control system by picking up an empty soda bottle for up to 4 min. Note that the participants could practice moving both arms during the training phase, although the practice task just required a single arm. This training strategy was used so that the user could get accustomed to initializing the system and moving the robot’s arms without focusing on the more pertinent aspects of bimanual assistance.

After the participants felt sufficiently comfortable using the control system, they took a short break while the experimenter set up the task [the task(s) for each study are outlined below]. The participants alternated between performing the tasks under different conditions and completing a questionnaire regarding their last robot control experience. This procedure was repeated until all tasks per condition were completed. After finishing all tasks, the participants completed a demographic questionnaire and were then debriefed on the details of the study.

Study 1 hypotheses

Our central hypothesis in study 1 was that the bimanual control-assisted condition would outperform the bimanual control non-assisted condition on all objective and subjective measures, and the bimanual control non-assisted condition would outperform the single-arm control condition on all objective and subjective measures. We believed that the bimanual control-assisted condition would outperform the bimanual control–non-assisted condition because our assistance modes were designed to help with the intricate actions and interactions that occur between the two hands rather than just considering the bimanual control problem as two separate instances of single-arm control. Further, we believed that the bimanual control–non-assisted condition would outperform the single-arm control condition on all objective and subjective measures because we thought having two hands, even if they do not correspond with each other or provide assistance, would still be beneficial for some of the complex manipulations incorporated in the breakfast-making task.

Study 1 participants

We recruited 24 volunteers (16 males and 8 females) from the campus of the Naval Research Laboratory in Washington, DC. Participant ages ranged from 18 to 60 years [mean (M) = 35.21, SD = 14.11]. Participants reported a moderate familiarity with robots (M = 3.79, SD = 2.02 measured on a seven-point scale). Eight participants had participated in a previous robotics study.

Study 1 experimental design and tasks

Study 1 followed a 3 × 1 within-subjects design. The participants used three control methods (single-arm control, bimanual control–non-assisted, and bimanual control-assisted) to complete a breakfast-making task. The single-arm control case was included as a state-of-the-art telemanipulation method comparison from previous work by Rakita et al. (46), wherein the authors reported the success of this approach over other interfaces such as a six-DOF stylus device and touch screen interface. We included this comparison to assess how this previously reported on interface extends to more dexterous manipulations and an experimental setup that is more reflective of standard human environments where two hands are often helpful. The bimanual control–non-assisted condition applies the method seen in the previous work by Rakita et al. (46) on two arms, essentially treating the bimanual problem as just two independent instances of single-arm mimicry control simultaneously. We included this condition to see how having two arms during robot control, even when the two arms do not have a sense of each other or the actions and interactions involved in bimanual actions, will affect task performance and user perceptions. Last, the bimanual control–assisted condition implements all of our bimanual assistance modes as inspired by our bimanual action vocabulary presented throughout our work. The conditions were presented in a counterbalanced order between participants. A participant can be seen executing various tasks from study 1 in Fig. 5.

Fig. 5 A novice user executing various subtasks from study 1.

The subtasks in study 1 followed a breakfast-making theme, matching our motivated use case of our bimanual shared-control method being used in remote home-care or telenursing scenarios.

To ensure the generalizability of our findings to a wide range of telemanipulation tasks in a standard human-centered environment, we developed a breakfast-making task consisting of an array of subtasks. We chose this task because one of the motivating research domains for our work is remote home care; thus, showing benefits in a simulated domain may indicate the potential of our methods to be used in such a scenario. The task as a whole consisted of an array of subtasks designed to test different manipulation abilities. Specifically, the task involved cracking open two large prop eggs and releasing the contents of the eggs into a bowl, removing the top of a “chopped peppers container” (a snack canister) and pouring the contents into a bowl, flipping the top off of a “salt container” (a bottle of disinfecting wipes) and pouring the contents into a bowl, “mixing egg yolks” by pouring the contents of one cup into another and forth three times, moving the bowl using two hands (if available) from one table to the other table, unscrewing the top off of an “orange juice container” (a laundry detergent bottle) and pouring the contents into a cup, and removing three plates from a drying rack and setting the table to the left of the robot. Participants were asked to complete as many of the subtasks as possible in 10 min given the current control condition.

Study 1 measures and analyses

To assess participants’ performance in study 1, we measured binary success over the 15 subtasks involved in the breakfast-making task. To measure participants’ perceptions about their robot control experience under various assistance conditions, we administered a questionnaire including eight scales to measure predictability, robot intelligence, fluency, goal perception, trust, ease of use, satisfaction, and usefulness, as well as the NASA Task Load Index questionnaire. The items and Cronbach’s alpha value for each scale are featured in Table 1.

Table 1 Subjective scale measures and Cronbach’s alpha values.

View this table:

Study 2 hypotheses

Our central hypothesis in study 2 is that all assistance modes engaged and only targeted assistance mode engaged would outperform both only targeted assistance off and all assistance modes off in all objective and subjective measures. We believed that this would be the case because each task was specifically designed to target a particular assistance mode; thus, the conditions where the target mode is present, even if the other assistance modes are turned off, should outperform the alternatives without the assistance engaged.

Study 2 participants

We recruited 24 volunteers (13 males and 11 females) from the campus of the Naval Research Laboratory in Washington, DC. Participant ages ranged 18 to 64 years (M = 31.92, SD = 12.53). Participants reported a relatively high familiarity with robots (M = 4.21, SD = 1.89 measured on a seven-point scale). Six participants had participated in a previous robotics study.

Study 2 experimental design and tasks

Our study 2 consisted of four separate 4 × 1 within-subjects experiments. Our goal in these experiments was to assess the relative contribution of each of our bimanual assistance modes (fixed offset, one hand fixed, self-handover, and one hand seeking) toward the performance benefits reported in study 1. For example, we wanted to assess whether our assistance mode designed to help with the self-handover action independently affords performance benefits, only elicits performance gains when used in concert with other assistance modes, or does not contribute to performance benefits at all.

To isolate the effect of each of the assistance types, each of the four experiments in study 2 consisted of a single task designed to target one of our assistance modes (fixed offset, one hand fixed, self-handover, and one hand seeking). The tasks for each assistance type are as follows:

1) Fixed offset. Participants moved a trash bin from a table in front of the robot to a table to the left of the robot. The trash bin was to be moved just by resting the robot’s hands on either side of the bin (the robot’s grippers were deactivated for this task).

2) One hand fixed. Participants opened and closed the top of a disinfecting wipes bottle three times. The bottle had a flip-up top, such that an effective strategy to open and close it involved holding the bottle steadily in place with one hand and using the robot’s other hand to manipulate the bottle.

3) Self-handover. Participants retrieved three plates one at a time off of the table with the robot’s right hand, passed the plate from right hand to left hand, and dropped the plate into a drying rack to the left of the robot.

4) One hand seeking. Participants stacked six cups into a single stack. The cups were organized in two rows of three and placed close enough together such that both hands had to be used throughout the task such that other cups in the grid were not knocked over.

Each of the tasks above were performed with four assistance variations: (i) all assistance modes off, (ii) only the targeted assistance mode engaged, (iii) all assistance modes engaged except for the targeted assistance mode engaged (referred to as only targeted assistance off), and (iv) all assistance modes engaged. Throughout study 2, the presented order of the four tasks was counterbalanced, and the order of the four assistance variations within those tasks was randomized. Participants had a maximum time of 2 min for each of the 16 task trials.

Study 2 measures and analyses

To assess participant’s performance in study 2, we used a compound objective measure that captured both success and timeliness in performing a task. This measure takes the general form ssmax+( tmaxttmax*ssmax ), where tmax is a maximum time allowed for a particular trial, t is the time it took a participant to complete s subtasks in the trial, and smax is the maximum number of subtasks to complete in a trial task. The possible range for this metric is 0 to 2. To measure participants’ perceptions about their robot control experience under various assistance conditions, we administered a questionnaire including the fluency and trust scales by Hoffman (56) found in Table 1.



Fig. S1. Motion dataset study.

Fig. S2. Notation for technical details.

Movie S1. Shared control–based bimanual robot manipulation.


Acknowledgments: We thank B. Adams, M. Bugajska, and G. Trafton for help and feedback throughout the development of this work. The views and conclusions contained in this paper do not represent the official policies of the U.S. Navy. Funding: Funding for this work was provided by the Office of Naval Research through an award to L.M.H. This work was also supported in part by the National Science Foundation under award number 1830242 and the University of Wisconsin–Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. Author contributions: D.R. investigated the presented bimanual shared-control approach, implemented the proposed methods, ran the evaluations, and led the writing of the paper. B.M. assisted with concept formation, experimental design, and writing of the paper. M.G. assisted with concept formation, technical formulations, and writing of the paper. L.M.H. assisted with concept formation, experimental design, overseeing the physical setup of the prototype system, technical formulations, and writing of the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to support the conclusions of this manuscript are included in the main text or the Supplementary Materials. The code for the IK solver is available at Contact D.R. and L.M.H. for other code requests.

Stay Connected to Science Robotics

Navigate This Article