Research ArticleHUMAN-ROBOT INTERACTION

On the choice of grasp type and location when handing over an object

See allHide authors and affiliations

Science Robotics  13 Feb 2019:
Vol. 4, Issue 27, eaau9757
DOI: 10.1126/scirobotics.aau9757

Abstract

The human hand is capable of performing countless grasps and gestures that are the basis for social activities. However, which grasps contribute the most to the manipulation skills needed during collaborative tasks, and thus which grasps should be included in a robot companion, is still an open issue. Here, we investigated grasp choice and hand placement on objects during a handover when subsequent tasks are performed by the receiver and when in-hand and bimanual manipulation are not allowed. Our findings suggest that, in this scenario, human passers favor precision grasps during such handovers. Passers also tend to grasp the purposive part of objects and leave “handles” unobstructed to the receivers. Intuitively, this choice allows receivers to comfortably perform subsequent tasks with the objects. In practice, many factors contribute to a choice of grasp, e.g., object and task constraints. However, not all of these factors have had enough emphasis in the implementation of grasping by robots, particularly the constraints introduced by a task, which are critical to the success of a handover. Successful robotic grasping is important if robots are to help humans with tasks. We believe that the results of this work can benefit the wider robotics community, with applications ranging from industrial cooperative manipulation to household collaborative manipulation.

INTRODUCTION

Robots are increasingly asked to perform tasks in environments shared with humans and other robots, and object manipulation is a skill demanded in many of these jobs. Humans effortlessly manipulate objects and their environment, learning this skill at an impressive young age. Enriching robots with similar abilities would enable them to complete assignments in multiple situations, such as households, medical care, industrial settings, and agriculture. In addition, interaction and cooperation are desirable in a plethora of circumstances, allowing robots to work alongside and support workers while concurrently ameliorating the safety in such environments.

Unexpectedly, robots are still far from having similar skills and often fail to accomplish actions as simple as grasping. A correct and stable grasp (1) is the first step toward a successful manipulation of an object. Humans grasp an abundance of objects and pay little to no attention to how they perform this action. However, grasping is hardly a simple task to execute because it heavily involves both sensory and motor control systems (2). Five major factors influence a grasp choice (3, 4): object constraints (e.g., shape, size, and function), task constraints (e.g., force and mobility), gripper constraints (e.g., the human hand or gripper kinematics and the hand or gripper size relative to the object to be grasped), habits of the grasper (e.g., experience and social convention), and chance (e.g., the initial position of the object and environmental constraints). For instance, both the reaching movement of the arm and the grasping movement of the fingers may be influenced by the agent’s goal and intention to cooperate or to compete with a partner (57).

Intuitively, every object can be grasped in a number of ways, and the final choice of grasp type and position of the fingers on the object is dictated by a combination of the abovementioned five factors. In general, the emphasis is put on a subset of these factors (3, 4, 810). Nonetheless, there seems to be no consensus as to which factor (if any) plays the most important role behind a grasp choice because factors have different weight and influence in every situation. Grasping is not an action per se but rather a purposive action (11): Humans tend to grasp objects to use them. Hence, task and object constraints seem to be fundamental when reasoning about grasping and must be taken into due consideration when devising a grasping strategy for robots. In other words, grasp choices are strongly situation dependent (8), and this defining characteristic justifies studies on specific situations to provide and gather all of the information needed for a successful implementation on a robot.

On this point, robotic grasping has yet to reproduce the innate skills that humans show. Some of the reasons behind this lack of success can be traced to difficulties in handling uncertainties in physical models, perception, and control. The problem of purposive robotic grasping, although studied (1217), has received considerably less attention than robotic object picking. The proliferation of analytic (1), data-driven (18), and learning-based (1921) methods has brought general advancement in robotic grasping and object picking; however, tasks generally involve soft constraints, as is the case of bin sorting and pick-and-place tasks. For instance, pick-and-place tasks require robots to grasp and deposit objects onto surfaces or into bins, regardless of the objects’ final position and orientation. In contrast, the use of a tool for a certain task requires robots to grip the tool stably and, simultaneously, to pick up the tool so that it can be used correctly (hard constraints on the grasp).

For all these reasons, manipulative skills are growingly essential in human-robot interaction. Cooperation and collaboration are complex compositions of activities that result in a natural flow of actions. Object handover is an example of joint action that involves one agent, the passer, transferring an object to a partner, the receiver. The implementation of an effective and natural-feeling human-robot handover is an unconquered challenge. Despite seeming a simple action, human-human handover is an effort of deduction and adjustment from both partners. Although the passer and receiver share the responsibility for the stability of the object, their goals differ during the interaction (22). The passer must properly present and safely release the object to the partner, whereas the receiver’s goal is to acquire a stable grasp of the object to subsequently perform tasks with it. Because the actions need be properly coordinated in time and space, the passer and receiver exploit a wide range of subtle signals to predict the partner’s actions and rapidly choose the correct response. For instance, gaze and body and arm positions can be used to communicate the intent to begin the handover and predict where the handover will take place (2326). In recent years, research on human-robot handover has focused on improving the predictability of the robot action and the fluency of the human-robot interaction by taking inspiration from human behavior. In particular, robotic systems were developed that were able not only to control the grip force exerted on the object (2729) but also to provide nonverbal cues similar to humans during the interaction (25, 26, 30, 31). It was also shown that humans seldom understood the robot’s actions and that the interaction was perceived as unsafe or not natural when a robot did not deliver the appropriate cues (3234). Findings of work investigating how a robot arm should be positioned to perform a handover (26, 35) also showed that the object should be presented to the human agent in its default orientation and generally positioned to allow an easy grasp by the receiver. However, these studies considered neither the passer’s and receiver’s grasp type nor the subsequent action that the receivers had to perform with the object.

Therefore, how the selection of a grasp type and the position of the hands on the object change during the handover with respect to a non-interactive manipulation task is still a research question that needs to be answered. The present work sought to address this open issue with a view to providing useful information to design a control strategy implementable on a robotic platform for seamless and natural handover, improving the effectiveness of human-robot interaction and of robotic grasping in general. Hence, in this investigation, we analyzed (i) the difference between grasp types chosen for a non-interactive action and grasp types chosen for handovers and (ii) whether a passer accounts for the receiver’s task to decide how and where to grasp the objects. Considering that both hands are not always available to a human operator during some tasks and considering the challenges yet to overcome in robotic grasping and manipulation, in-hand and bimanual manipulation were not taken into consideration in the present work.

To this end, we conducted an experiment where 17 pairs of participants were asked to grasp and pass 17 objects to each other. The object list included a closed pen (CPen), an open pen (OPen), a key (Key), a screwdriver (Screwdriver), an adversarial three-dimensionally (3D) printed object (WShape), a plastic apple (Apple), a ball (Ball), a light disk (LDisk), a heavy disk (HDisk), a filled glass (FGlass), an empty glass (EGlass), a filled bottle (FBottle), an empty bottle (EBottle), a box of cheese crackers (Box), a thin book (Book), a metallic rod (Bar), and a toy monkey (Teddy). The experiment was composed of two sessions. In the non-interactive session (NIS), one participant (passer) was asked to grasp the objects from a table and perform two different tasks with each of the objects. In the handover session (HS), the passer was asked to hand the objects over to a partner (receiver) who subsequently performed the same two tasks the passer had performed in NIS. The first task was identical for all of the objects and consisted of putting the object onto a box. The second task was object specific. We tracked the objects and the hands of the participants using a vision system and manually labeled the grasp types of both agents through a frame-by-frame analysis of the video recordings. To avoid ambiguity in the classification, we adopted the following discriminative rule. A grasp was classified as a precision grasp when only distal and intermediate phalanges were used, whereas the grasp was classified as a power grasp when palm and proximal phalanges were also involved. This rule prompted us to develop a taxonomy (Fig. 1), which elaborated and modified three existing taxonomies in literature (3, 8, 10). Last, we statistically analyzed the passer’s palm position relative to the objects and evaluated the differences across the four conditions of the experiment (two sessions times two tasks).

Fig. 1 Taxonomy used to classify grasps in the experiment.

The proposed taxonomy comprises three top-level categories: power, intermediate, and precision grasps. Power and precision grasps are both subdivided into circular and prismatic types. Further classifications are reported with a higher level of detail, leading to 15 classes (power prismatic: HW, PW, and M&L W; power circular: D and S; intermediate: L, ST, and W; precision prismatic: 4F, 3F, 2F, and 1F; precision circular: D, S, and T). Our analysis was focused on the abovementioned classes. For the sake of completeness, we also reported all 28 grasp types included in the classes and used to classify the grasps during the labeling process. For each grasp, the taxonomy reports a picture showing the grasp and an alphabetical label in reference to the taxonomy it was taken from: C from (3) and F from (8). The images are taken and adapted with permission from (8).

Our results show that, during the handover, passers preferred precision grasps, likely to maximize dexterity and to leave enough space for the receiver to comfortably grasp the object. Furthermore, when in-hand and bimanual manipulations were not allowed, passers tended to grasp the objects from the extremities, and in the presence of clear affordances (i.e., object parts that suggest specific actions, such as handles) (3638), the position of their hand on the object was also influenced by the task the receiver must perform. This allows receivers to accept the object using immediately the most appropriate grasp to accomplish the subsequent task. This results in a high similarity between the receivers’ grasps and those exploited by participants in NIS.

RESULTS

Grasp type

A total of 5202 grasps were labeled in this work: 1734 in NIS and 3468 in HS (1734 for the passer and 1734 for the receiver). Overall, precision, power, and intermediate grasps covered 62.0% (3227 occurrences), 30.4% (1580 occurrences), and 7.6% (395 occurrences) of the registered grasps, respectively (Fig. 2). Objects such as Ball and Apple required a very limited set of grasps (mainly precision sphere and power sphere), whereas LDisk, HDisk, WShape, EBottle, and Screwdriver exhibited eight different grasps.

Fig. 2 Distribution of grasps throughout the experiment.

The heat map on the left-hand side reports the occurrences of each grasp type over the 17 objects. For each object, 306 grasps were labeled. The histogram on the right-hand side shows the overall frequencies of the grasp types normalized by the total number of the labeled grasps (i.e., 5202). Precision grasps are the majority (62% of all grasps), followed by 30% of power grasps and 8% of intermediate grasps.

When the performance of the passers was analyzed over the tasks as shown in Fig. 3A, power grasps decreased from 29.8% in NIS to 20.9% in HS in task 1, whereas the drop was from 42.2% in NIS to 18.9% in HS in task 2. Intermediate grasps were less evidently affected by the NIS/HS condition. In particular, passers used more precision 3-fingers and 2-fingers and intermediate lateral in both tasks in HS than in NIS (Fig. 3A′). When the performance of the passers was analyzed over the sessions as shown in Fig. 3B, in NIS, task 2 exhibited more power and intermediate grasps than task 1, with an increase of task-specific grasps such as intermediate writing and of power prismatic Heavy Wrap and Palmar Wrap (Fig. 3B′). However, the percentages of precision grasps were higher in both tasks and sessions, with a frequency of 50.9 and 73.6%, respectively, for NIS and HS. When comparing the grasps performed by the passers in NIS and the grasps performed by the receivers in HS, there was a certain similarity, as shown in Fig. 4. There was a noticeable equivalence between the compared tendencies, with a difference in a higher percentage of intermediate grasps in the receivers in HS.

Fig. 3 Comparison of the performances of the passers in NIS and HS.

(A) Histograms of the grasp choices over the two tasks; (B) histograms of grasp choices over the two conditions (passers in NIS and receivers in HS). (A′) and (B′) depict the same information but show higher levels of granularity in the grasp types. The frequencies shown are normalized by the total of 867 grasps performed by passers for each task of each session.

Fig. 4 Comparison of the performances of the passers in NIS and receivers in HS.

(A) Histograms of the grasp choices over the two tasks; (B) histograms of grasp choices over the two conditions (passers in NIS and receivers in HS). (A′) and (B′) depict the same information but show higher levels of granularity in the grasp types. The frequencies shown are normalized by the total of 867 grasps performed by passers or receivers for each task of each session.

Last, Fig. 5 shows a comparison between the grasps that passers and receivers executed during HS. The high values in the diagonal of the heat map display a high similarity in the grasps. However, the top-right semi-plane is more populated than the bottom-left semi-plane, showing the tendency in receivers to use more power grasps than passers.

Fig. 5 Comparison of grasp types between passers and receivers during handovers.

The heat map reports all 1734 combinations of the passer’s and receiver’s grasps observed in HS. The highest number of occurrences is found on the diagonal of this map. The red top-right semi-plane represents occurrences of more powerful grasps adopted by the receivers (with respect to passers) and is more populated than the bottom-left semi-plane (more powerful grasps by the passers with respect to receivers).

Object occlusion

For each object and each of the four conditions of the experiment (two sessions times two tasks), we reported (Fig. 6) the value of Pac of each of the 17 passers, where Pac is the median value of the approaching coordinate of the passer’s palm relative to the object (defined as shown in Fig. 7) across the three repetitions of each task. The distribution of Pac differed across objects and conditions, showing a strong influence of both task and object constraints (Fig. 6). Palm positions did not change notably when the objects had less stringent geometric constraints, as in the cases of Ball and Apple in light of their spherical symmetry. However, passers rearranged the position of the palm when task or object constraints were significant.

Fig. 6 Distribution of palm positions of the passers relative to the objects.

Plots show the median approaching coordinate Pac of all the 17 passers relative to each object across sessions and tasks. The significant comparisons (P < 0.05) performed on |Pac| and on |dPac|, using the Wilcoxon test with the Bonferroni corrections, are reported with (*) and (★), respectively. In particular, |Pac| describes how the grasping locations are shifted toward the extremities of the objects, whereas |dPac| describes how they are clustered far from the median of the distribution. The bottom-right plot shows whether Key and Screwdriver were grasped by the handle or by the other extremity (Not Handle).

Fig. 7 Experimental setup and test objects set.

(Top) Left: Outlines of the three types of objects that we used and their frame system identified by the axes x, y, and z along their major dimensions X, Y, and Z (XYZ), respectively. These drawings show, for each type of object, the distance vector Embedded Image from the centroids of the objects (CO) to the centroid of the passer’s hand (CH) and the approaching coordinate of the passer’s hand in the object frame (ac). Right: The experimental setup, the test objects, and the test gloves are shown. (Bottom) The table reports, for each object, its label, its three major dimensions, its mass, the mathematical definition of ac, and the tasks. We refer to dx, dy, and dz as the components along x, y, and z of the vector Embedded Image.

For each object, four comparisons (NIS task 1 versus HS task 1; NIS task 2 versus HS task 2; NIS task 1 versus NIS task 2; HS task 1 versus HS task 2) were performed using the Wilcoxon test (adjusted with the Bonferroni correction) on both the absolute values of Pac (|Pac|) and on the absolute distance between Pac and the median of the distribution of Pac of each condition (|dPac|). In particular, we analyzed |Pac| to investigate the shift of the passer’s palm position toward the extremities of the objects. In addition, we analyzed |dPac| to investigate how the palm positions were clustered around the median of each condition. The complete statistical results are reported in Table 1.

Table 1 Statistical results.

Results of the comparisons performed using the Wilcoxon test on the absolute value of the passer’s palm position relative to the object (|Pac|) and the absolute distance between the median of the distribution of each condition and the passer’s palm (|dPac|). The P values reported are adjusted according to the Bonferroni correction. Significant results (P < 0.05) are boldface.

View this table:

|Pac| values differed between NIS task 1 and HS task 1 for CPen, OPen, Key, WShape, HDisk, LDisk, and Bar. Comparisons of Pac values between NIS task 2 and HS task 2 were significant for OPen, Key, FGlass, EGlass, FBottle, Bar, Screwdriver, and Teddy. |Pac| values in NIS task 1 and NIS task 2 were statistically different for Key, WShape, HDisk, LDisk, FGlass, EGlass, Box, and Book. Last, |Pac| values significantly differed between HS task 1 and HS task 2 for the Key and the Ball. In particular, comparisons between NIS and HS showed that the passer’s palm was closer to the extremities of the objects in HS than in NIS. However, when the second task was an inserting task (i.e., for the disks, EBottle, etc.), HS and NIS did not differ significantly.

The analysis of |dPac| values showed that NIS task 1 and HS task 1 were statistically different for WShape, EBottle, and Screwdriver. |dPac| values differed between NIS task 2 and HS task 2 for objects such as CPen, Ball, FGlass, EGlass, FBottle, Screwdriver, and Bar. Comparisons of |dPac| values between NIS task 1 and NIS task 2 were significant for WShape and Bar, whereas HS task 1 and HS task 2 differed only for Book. These comparisons show that, generally, the passers’ palm positions were clustered further away from the median in HS than in NIS. However, when the second task was an inserting task, as already observed for |Pac| values, HS and NIS did not differ significantly.

In NIS, Key and Screwdriver were grasped from the handle in 45.1 and 100%, respectively, of the cases in task 1. Key and Screwdriver were grasped from the handle in 100% of cases for task 2. Passers grasped Key and Screwdriver from the handle in 84.3 and 60.8%, respectively, of the cases in HS when the receiver had to simply put the object on the box. These frequencies strongly decreased when the receiver needed to grasp the handle of the object to perform the object-specific task. In this condition (HS task 2), passers grasped the Key and Screwdriver from the handle only in 5.9 and 21.6%, respectively, of cases. In all the other cases, the passer left the handle completely unobstructed.

DISCUSSION

There are multiple factors influencing a grasp, and constraints deriving from task, object, and environment have to be assessed in each and every situation. Despite being a consequence of multiple sources of constraints, grasping is incontrovertibly a purposive action, and “it is the nature of the intended activity that finally influences the pattern of the grip” [(11), p. 906]. Anatomical considerations and constraints introduced by the objects, e.g., size and shape, also claim firm roles behind the decision of a grasp.

Thus, objects can generally be gripped with different grasps. However, a handover introduces additional constraints to grasping and strongly influences the final choice. Therefore, the aim of this work was to assess the change in grasp type and in hand placement between a non-interactive action and a handover. To this end, we performed an experiment comparing participants grasping objects for a direct use (in NIS) and for an object handover (in HS). The experimental results of this work show that passers modify their grasp choice, favoring precision grasps during the handover and allowing the receiver to perform the same grasps used in NIS. We also observed that passers accounted for the receiver’s subsequent task and changed their grasp strategy, especially when passers had to pass objects with a clear affordance, such as Key and Screwdriver, which are equipped with handles.

Ambiguities in grasp labeling

Grasping is a very intricate sequence of actions, and its classification into a finite set of types is challenging and ambiguous. Although there seems to be some consensus on the definition of power and precision grasps (3, 4, 11, 39), every grasp aims to manipulate an object (precision element) while stably holding it (power element). Thus, elements of precision handling and power grip usually coexist in the same action (11, 40, 41), and consequently, a grasp is generally classified by its predominant element.

To help disambiguate between potentially similar-looking power and precision grasps, we classified a grasp as a precision grasp when only distal and intermediate phalanges were used, whereas a grasp was classified as a power grasp when palm and proximal phalanges were also involved. This rule prompted us to partially modify the existing taxonomies in literature (3, 8, 10) to be as consistent as possible in the process of labeling. The discrimination between power and precision grasps performed in this work is in agreement with previous studies about human grasp (11, 39, 42), which found that the palm and the proximal part of the hand are involved in a power grip. Therefore, we firmly believe that it did not influence the results of the classification. Moreover, this rule implicitly suggests how much of the object surface was obstructed by the use of each grasp type. Power grasps occlude a bigger portion of the grasped object, whereas precision grasps leave more of the surface unencumbered. This aspect is of prime importance during a handover where both people have to share the limited surface of the same object. Our classification rule might be translated into a control strategy to be implemented on robots for a more purposive grasping and for a more effective handover.

Object characteristics and task constraint do influence grasping

For our experimental setup, we used a number of objects differing in shape and size (Fig. 7). Feix et al. (43) showed that objects with a grasp side wider than 5 cm and a mass over 250 g are more likely to be grasped with a power grasp. Counterintuitively, objects with mass less than 200 g and whose grasp side is smaller than 5 cm are not necessarily grasped using a precision grasp; power and intermediate grasps are also frequently used. Our results show a prevalence of precision grasps (62%), albeit the characteristics of the objects used in this experiment varied extensively (Fig. 2). In particular, 53% of our objects had at least one dimension smaller than 5 cm; this is important because humans display a clear preference to grasp the smallest dimension of an object (43). Furthermore, most (59%) of the objects in our set weighed less than 200 g, and only two objects were heavier than 500 g. In contrast with the work of Feix et al. (43, 44), we observed that objects heavier than 250 to 300 g—such as FBottle, FGlass, HDisk, and Ball—did not induce a clear preference toward power grasps during our experiment. It is our opinion that this difference may be a consequence of the different setup (environment) and tasks between the two investigations. This hypothesis is in agreement with discussions in Feix et al. (43, 44) and other previous works, such as (4, 11), stating that the grasp choices can change and shift because of dissimilarities in the initial conditions, friction, personal factors (such as habits), and task demands. In (43, 44), the actions of two housekeepers and two machinists were observed during their daily work, in which the most common tasks, such as mopping or turning knobs, needed some strength to be performed. On the contrary, the tasks we asked people to perform were quite easy and did not require much force. Participants in NIS and receivers in HS were first asked to simply grasp the objects and put them onto a box. Our results indicate that, for this action, the object and task constraints in most of the cases were not strong enough to encourage the participants to use power grasps rather than precision grips. Participants in NIS and, similarly, receivers in HS favored precision grasps in 70.1 and 61.2%, respectively, of cases (Figs. 3 and 4). As expected from our hypothesis, results similar to those of Feix et al. (43, 44) were found when the participants were asked to use the target objects on a more specific assignment (task 2). This condition triggered the participants in NIS and the receivers in HS to increase the frequency of power grasps up to 42.2 and 40.8%, respectively. In addition, a very clear preference toward power grasps was observed for the bar (Fig. 2), the biggest object in our dataset and the only one heavier than 600 g.

Handover entails precision grasps for passers but not for receivers

A shift toward precision grasps was observed during the HS of our experiment. In both tasks, objects were grasped with precision or intermediate grasps preferred by passers in HS, differently from NIS (Fig. 3). Regardless of the task carried out by the receiver, passers exhibited a strong preference for precision grasps, recorded in 73% of the cases, whereas power grasps were displayed in less than 21% of the cases. This predisposition might be a consequence of the main objective of the passer, which is to accomplish an efficient and fluent object handover to the receiver. To this end, it is likely that passers prefer grasp types that allow for dexterous positioning and orientation (and adjustment) of the object. Fingertips are the area of the human hand with the highest density of mechanoreceptors (45, 46), and their use emphasizes accuracy and sensitivity over stability (47, 48). In detail, the skin of the fingertips appears to have the highest density of FA I, SA I, and FA II units, which convey tactile information to the brain. This is fundamental during precise/dexterous object manipulation (49) and hence critical during handovers. SA I units are particularly sensitive to static forces applied on the object; therefore, they play a key role in the feedback control of the grip force of the passer on the object. In contrast, FA I and FA II encode fast events between the skin and the object, such as friction, and events acting on the hand-held object, such as the contact of the receiver on the object. This information is crucial for the passer to behave reactively in correction of slippage events (50) and to begin the release of the object in coordination with the grasping action of the receiver (22).

An additional point of consideration is the necessity for passers to leave enough unobstructed surface area on the object to facilitate the grasp of the receiver. This detail is in agreement with previous research in neuroscience (6, 51, 52) that showed the influence of the after-grasp task on grasp parameters such as the preshaping of the fingers. Unequivocally and by definition, precision grasps obstruct a smaller portion of the object than power grips, thus leaving the receivers with enough space to choose a grasp from a wider range of possibilities.

Furthermore, receivers were able to grasp and subsequently use objects in both tasks of our experiment, exhibiting grasps similar to those used by the participants in NIS, who had only the aim to use the objects directly (Fig. 3). Consequently, passers and receivers did not practice the same grasp types in HS because receivers used more power grips than passers (Fig. 5). Together, those outcomes suggest that, unlike passers, receivers do not shift their grasp choice toward precision during the handover because they prefer, whenever possible, a grasp that allows them to accomplish the subsequent task easily and comfortably.

Handover and subsequent action of the receiver influence the grasp location of the passer

Our experiment additionally shows that passers modulate not only their grasp type but also their grasping location to allow a comfortable and ready-to-use grasp to the receiver (Fig. 6). The analysis of the position of the palm of the passers relative to the objects shows a shift toward the extremities of the objects in HS. In the case of relatively compact objects such as WShape and the two disks, passers placed their palm further away from the center of the object with respect to when they had to put the same objects on the box during NIS. The distribution of the approaching coordinate of the palm of the passer relative to the frame of long objects (such as the glasses, bottles, Screwdriver, and pens) tends to widen in HS, moving from the median of the distribution and clustering near the extremities of the objects.

Comparable distributions to HS were observed in NIS when the participants had to insert objects into a hole (WShape, disks, EBottle, CPen, and Bar). This similarity suggests that, similar to the case of inserting tasks where participants know that they have to leave an object’s side free to comfortably complete the action, passers consider the space needed by the receiver to easily grasp the object during the handover.

These results are in agreement with the qualitative study of Strabala et al. (26), suggesting that passers tend to present objects in a way that allows easy grasping by the receiver. The same authors observed that, in some instances, passers rotated objects, such as a cup, so that the receivers could grab the handle. Moreover, orientation is a real spatial property with distinct visual appearance that affords action (53). Our work expands those results, suggesting that this behavior is strongly influenced by the task that the receiver has to perform with the object after the handover. We observed that, when the receiver had to grasp the Key and Screwdriver by the handle to perform the object-specific task, the passer grasped the tip of the object, leaving the handle free (Fig. 6). On the contrary, passers often passed the Key and the Screwdriver grasping the handle when the receiver’s task was a placing task (task 1).

In a study about how humans interact and use tools, Gibson (54) defined the possibilities for action offered by objects or by the environment as object affordances. The term “perceived affordance” was then used and diffused by Norman (55) in the context of human-machine interaction, making this concept dependent not only on the agents’ capability but also on their goal and past experience. In line with this theory, our results show that a clear object affordance, such as a handle, may invite different behaviors not only during single-agent actions (56, 57) but also during cooperative tasks such as handovers. In particular, we suggest that, on the basis of their past experiences, passers do reason about which area of the object can afford the receiver’s subsequent task and adapt their grasp strategy to appropriately present the object to the partner.

Conclusion and future work

Globally, our results indicate that passers do modulate the grasp considering not only their own task constraints (to fluently release the object) but also the requirements of the receiver’s tasks to perform an efficient and fluent interaction. In particular, passers likely make predictions about their partner’s action using knowledge about the task that receivers want to perform after the handover and adjust their behavior according to this prediction. This hypothesis is in agreement with a recent neuroscientific study (58) that demonstrated that, when two agents are involved in a joint action requiring a degree of interpersonal coordination, they are able to predict the other agent’s action (and the effects of the other agent’s action) and to integrate the prediction into the motor plan that guides their movements throughout the interaction (in feedforward and in feedback).

To return to our initial questions about how a handover modifies grasp choices, we suggest that there is an adjustment in the passer’s choice of grasp and palm placement to facilitate the receiver. We are fully aware that there are numerous areas in this field of research needing future work. A natural extension includes the tracking and analysis of finger placement alongside the palm. Finger placements are surrogate information for a deeper understanding of the rearrangements that passers perform during handovers. Moreover, tracking the fingers might also help with the classification of grasps, in that these data, in conjunction with the disambiguation rule adopted in this paper, should decrease the ambiguity in grasp classification. On the downside, the complexity of the tracking system would increase as would the risk of affecting participants’ dexterity, influencing their grasps and their hand/fingers location on the object. In parallel, the addition of novel tasks and objects (i.e., heavier objects than those we used and made-up objects of original shapes) might bring new insights into the shift of grasp choice and the impact that diverse tasks and object attributes have on hand placement.

In conclusion, we firmly believe that an implementation of the disambiguation rule and of the conclusions of this work will have a beneficial effect on the effectiveness of robotic grasping, affecting a comprehensive range of applications. For instance, the consideration of object and task constraints will help in choosing grasp types that allow industrial robots not only to grasp objects and tools but also, more importantly, to use those objects and tools to successfully complete more refined tasks. In terms of collaborative manipulation, the conclusions of this work are a step toward augmenting the effectiveness and naturalness of object handovers, which are necessary for enhanced cooperation and collaboration.

MATERIALS AND METHODS

Participants

Seventeen pairs of people (for a total of 34 participants; gender, 23 males and 11 females; mean age, 30.9; SD, 7.9) took part in the experiment. All participants were healthy, used their right hand, reported normal vision, and were not aware of the purpose of the experiment. All participants participated on a voluntary basis and gave their signed consent to the participation. This project was reviewed within the Science and Engineering Faculty Low Risk Ethics process, confirmed to be Low Risk, approved, and confirmed to meet the requirements of the National Statement on Ethical Conduct in Human Research. This project was considered and approved by the Chair, Queensland University of Technology Human Research Ethics Committee under reference number 1800000243 in the Ethics Category: Human Negligible-Low Risk.

Experimental setup and protocol

The experimental setup consisted of a visual tracking system, 17 test objects, two thin fingerless test gloves, infrared passive markers (IRm), a box with holes of different shapes, and a reference object (Fig. 7). We used the OptiTrack capture system (http://optitrack.com/) as the visual tracking system. Ten Flex 13 cameras were placed around the scene: Eight cameras were set to tracking mode and tracked the position of IRm at 120 Hz, and two cameras were set to reference mode and recorded videos of the experiment at 30 fps (Fig. 7). The OptiTrack system sent the tracking data and video recordings to a PC that ran the Motive software.

To sensorise the test objects and gloves without affecting or influencing the grasp choice of the participants, we used different patterns and types of IRm (Fig. 7). In particular, very tiny spherical IRm (diameter, 3 mm) or an IRm tape was used on little objects or whenever possible; slightly bigger IRm (diameter, 8 mm) were used only on the larger objects. The gloves were sensorized only on the back, leaving the palm surface completely free. In addition, the different patterns of IRm allowed the software to distinguish the objects, assigning a univocal coordinate frame to each of them and recording both position and orientation at 120 Hz. To simplify the data analysis, we also placed the markers to have the axes of the objects’ frames along the three major dimensions of the objects. A detailed list of the 17 test objects, describing their sizes and mass, is reported in Fig. 7. Last, a reference object (triangular in shape and fixed on the scene) was used to set the global reference frame of the OptiTrack system.

In each pair of participants, one was asked to play the role of passer and the other was asked to play the role of receiver. Both participants were asked to use only the right hand during the experiment, and the hand was instrumented with a test glove. The experiment comprised two sessions. The aim of the first session (NIS) was to observe how a single person grasps objects for a direct use. The second session (HS) sought to analyze the change of grasp when a person has to pass the same objects to a partner while knowing the task the partner has to perform with it. Therefore, only the passer was involved in NIS, and the participant stood in front of a table under the visual tracking system. The participant was asked to repeatedly grasp the 17 test objects, placed singularly on the table, and subsequently to perform a task. The experiment included two tasks per object. The first task (task 1) was a general placing task and required to move the object onto a box placed on the right-hand side of the participant. The second task (task 2) was an object-specific action and differed according to the object’s characteristic and function (Fig. 7). When task 2 consisted of an inserting action, participants had to insert the object into a specific hole of the box. When task 2 was a pouring action, participants had to pour the contents of the objects into a cup on the table. The protocol required that the passer grasp each object three times to perform task 1 and then three times to perform task 2.

In the subsequent session (HS), the passer and receiver stood (both under the visual tracking system) at opposite sides of the table and were asked to collaborate, allowing the receiver to perform the same two tasks already carried out by the passer in NIS. In particular, the passer had to pass each of the 17 test objects to the receiver six times. During the first three handovers, the receiver was asked to grasp the object from the passer and use it to perform task 1; the receiver had to perform task 2 in the following three handovers.

Participants were asked in both sessions to put the instrumented hand on the table before the beginning of each trial. Then, the experimenters placed the object on the table in the appropriate position (each object had a specific initial position that was kept constant throughout the experiment to ensure repeatability) and informed both participants of the task to perform. A verbal command (“Go”) from the experimenters signaled to start the action.

Participants were asked to use only one hand and to minimize (or avoid altogether if possible) in-hand manipulation of the objects. This instruction was motivated by the fact that (i) considering the difficulties still present to implement robotic bimanual and in-hand manipulation, the results of this work can still be promptly implemented on many current robotic platforms; (ii) in everyday life, it is possible for receivers to ask for objects so that they are directly usable, e.g., when asking for an object without directly seeing the passers passing the object or asking for a tool while performing a job and having the other hand busy; and (iii) it allowed a reliable classification of the grasp used by the participants. The experimenters visually monitored the execution of the trials to verify their correctness. The trial was repeated whenever the visual tracking failed. The order of the test objects to be grasped was chosen randomly. It is our opinion that this order was not relevant because all the objects can likely be found in all households, and they are often used in everyday life. A break of 5 to 10 min was included between the two sessions, and the average duration of the experiment was 1.5 hours.

Data analysis

We analyzed the videos recorded by the two cameras set to recording mode and manually classified the grasps performed by passers and receivers in each trial of NIS and HS. Illustrative examples of trials can be found in movie S1. Classifying grasp type is not trivial. This is mainly due to the fact that, sometimes, there are only little differences among grasps. Therefore, for labeling purposes, we used all the 28 grasp types shown in Fig. 1. Such a high level of detail in the grasp type discrimination helped us to solve ambiguities and to be consistent during the classification process.

The grasps in NIS were labeled when the passer completed the grasping action and firmly held the test object; grasps in HS were selected as close as possible to the moment of the handover. The Motive software aligns video frames with the tracking data; hence, it was possible to extract the instant of time (Tg) at which the labeled grasps occurred through a frame-by-frame analysis. Once this analysis of the video recordings was completed, we computed the frequencies of grasp occurrences over the sessions (NIS and HS), over the objects, over the tasks (task 1 and task 2), and over the roles (passer and receiver). These analyses were performed focusing on the classes of the two lowest levels of the taxonomy tree in Fig. 1. This choice is in line with other studies in literature, such as (59), where only nine general classes of grasps are used to perform the analyses.

The position and orientation of objects and gloves worn by participants were used to study the obstruction of the objects’ surface due to the position of the passer’s hand. We refer to the centroid of the passer hand as CH and we refer to the centroids of the objects as CO. For each object, we identified its three major dimensions, X, Y, and Z, with XYZ. The axes along these three dimensions identified the frame of each object and were denoted as x, y, and z, where x was along the longest object dimension and z was along the shortest dimension, respectively. The origin of the object frame was placed on CO (Fig. 7). For each time instant, we computed the homogeneous transformation matrix that related the coordinate frame of the passer’s palm to the frame of the object. From the transformation matrix, we obtained the distance vector Embedded Image from CO to CH with components dx, dy, and dz along the axes x, y, and z, respectively. This allowed us to evaluate over time the approaching coordinate of the passer’s hand in the object frame (ac) (Fig. 7). In particular, when X > YZ, ac was defined as the projection of Embedded Image along the major axis x and coincides with dx. When X = Y > Z (disks), the object has not a unique major axis or a defined orientation on the plane xy as x and y can be rotated around z without any change in length. Thus, in this case, we defined ac as the magnitude of the projection of Embedded Image on the plane xy computed as Embedded Image . When X = Y = Z (spheres), because any cross section of the object is equally circular and the object has not a major axis or a defined orientation in the 3D space, we defined ac as the magnitude of the vector Embedded Image computed as Embedded Image. The median value of ac on the 10 samples following Tg was computed and used to calculate, for each participant, the median across the three repetitions of each task of each session (Pac).

Last, for each object and for each of the four conditions of the experiment (two sessions times two tasks), we evaluated the absolute value of Pac (|Pac|) and the absolute difference between each Pac value and the median of the Pac across participants (|dPac|). Specifically, we aimed to investigate how much the distributions of Pac were shifted toward the object’s extremities and how the distributions of Pac were clustered around the median. The answers to these questions were found respectively with the analysis of |Pac| and with the analysis of |dPac| across the conditions. Therefore, we carried out four comparisons for each object on both |Pac| and |dPac| (NIS task 1 versus NIS task 2; HS task 1 versus HS task 2; NIS task 1 versus HS task 1; NIS task 2 versus HS task 2) using the Wilcoxon test adjusted with the Bonferroni correction. Statistical significance was defined for a P value of <0.05. For the two test objects with a defined handle, Key and Screwdriver, we also manually annotated how many times passers grasped them from the handle in each of the four conditions of the experiment.

SUPPLEMENTARY MATERIALS

robotics.sciencemag.org/cgi/content/full/4/27/eaau9757/DC1

Movie S1. Example of trials with the views of the object tracking.

REFERENCES AND NOTES

Acknowledgments: We thank all the participants for taking the time to participate to the experiment and A. Marmol and M. Strydom for help with the OptiTrack motion capture device. Funding: V.O. and P.C. are supported by the Australian Research Council Centre of Excellence for Robotic Vision (project number CE140100016). F.C. and M.C. are supported by the European Research Council (project acronym MYKI; project number 679820). Author contributions: F.C. and M.C. initiated this work. M.C., F.C., and V.O. designed the experiments. V.O. and F.C. built the experimental setup and performed the experiments. All authors analyzed the data and discussed and wrote the manuscript. M.C., V.O., and P.C. oversaw and advised the research. P.C. provided financial support. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions are available in the paper or the Supplementary Materials. Please contact F.C. for other data and materials.
View Abstract

Stay Connected to Science Robotics

Navigate This Article