Research ArticleHUMAN-ROBOT INTERACTION

A tale of two explanations: Enhancing human trust by explaining robot behavior

See allHide authors and affiliations

Science Robotics  18 Dec 2019:
Vol. 4, Issue 37, eaay4663
DOI: 10.1126/scirobotics.aay4663
  • Fig. 1 Overview of demonstration, learning, evaluation, and explainability.

    By observing human demonstrations, the robot learns, performs, and explains using both a symbolic representation and a haptic representation. (A) Fine-grained human manipulation data were collected using a tactile glove. On the basis of the human demonstrations, the model learns (B) symbolic representations by inducing a grammar model that encodes long-term task structure to generate mechanistic explanations and (C) embodied haptic representations using an autoencoder to bridge the human and robot sensory input in a common space, providing a functional explanation of robot action. These two components are integrated using (D) the GEP for action planning. These processes complement each other in both (E) improving robot performance and (F) generating effective explanations that foster human trust.

  • Fig. 2 Illustration of learning embodied haptic representation and action prediction model.

    An example of the force information in (A) the human state, collected by the tactile glove (with 26 dimensions of force data), and force information in (C) the robot state, recorded from the force sensors in the robot’s end effector (with three dimensions of force data). The background colors indicate different action segments. For equivalent actions, the human and the robot may take different amounts of time to execute, resulting in different action segment lengths. (B) Embodied haptic representation and action prediction model. The autoencoder (yellow background) takes a human state, reduces its dimensionality to produce a human embedding, and uses the reconstruction to verify that the human embedding maintains the essential information of the human state. The embodiment mapping network (purple background) takes in a robot state and maps to an equivalent human embedding. The action prediction network (light blue background) takes the human embedding and the current action and predicts what action to take next. Thus, the robot imagines itself as a human based on its own haptic signals and predicts what action to take next.

  • Fig. 3 An example of action grammar induced from human demonstrations.

    Green nodes represent And-nodes, and blue nodes represent Or-nodes. Probabilities along edges emanating from Or-nodes indicate the parsing probabilities of taking each branch. Grammar model induced from (A) 5 demonstrations, (B) 36 demonstrations, and (C) 64 demonstrations. The grammar model in (C) also shows a parse graph highlighted in red, where red numbers indicate temporal ordering of actions.

  • Fig. 4 Robot task performance on different bottles with various locking mechanisms using the symbolic planner, haptic model, and the GEP that integrates both.

    (A) Testing performance on bottles observed in human demonstrations. Bottle 1 does not have a locking mechanism, bottle 2 uses a push-twist locking mechanism, and bottle 3 uses a pinch-twist locking mechanism. (B) Generalization performance on new, unseen bottles. Bottle 4 does not have a locking mechanism, and bottle 5 uses a push-twist locking mechanism. The bottles used in generalization have similar locking mechanisms but evoke significantly different haptic feedback (see text S1). Regardless of testing on demonstration or unseen bottles, the best performance is achieved by the GEP that combines the symbolic planner and haptic model.

  • Fig. 5 Explanations generated by the symbolic planner and the haptic model.

    (A) Symbolic (mechanistic) and haptic (functional) explanations at a0 of the robot action sequence. (B to D) Explanations at times a2, a8, and a9, respectively, where ai refers to the ith action. Note that the red on the robot gripper’s palm indicates a large magnitude of force applied by the gripper, and green indicates no force; other values are interpolated. These explanations are provided in real time as the robot executes.

  • Fig. 6 Illustration of visual stimuli used in human experiment.

    All five groups observed the RGB video recorded from robot executions but differed by the access to various explanation panels. (A) RGB video recorded from robot executions. (B) Symbolic explanation panel. (C) Haptic explanation panel. (D) Text explanation panel. (E) Summary of which explanation panels were presented to each group.

  • Fig. 7 Human results for trust ratings and prediction accuracy.

    (A) Qualitative measures of trust: average trust ratings for the five groups. (B) Average prediction accuracy for the five groups. The error bars indicate the 95% confidence interval. Across both measures, the GEP performs the best. For qualitative trust, the text group performs most similarly to the baseline group. For a tabular summary of the data, see table S1.

  • Fig. 8 An example of the GEP.

    (A) A classifier is applied to a six-frame signal and outputs a probability matrix as the input. (B) Table of the cached probabilities of the algorithm. For all expanded action sequences, it records the parsing probabilities at each time step and prefix probabilities. (C) Grammar prefix tree with the classifier likelihood. The GEP expands a grammar prefix tree and searches in this tree. It finds the best action sequence when it hits the parsing terminal e. It finally outputs the best label “grasp, pinch, pull” with a probability of 0.033. The probabilities of children nodes do not sum to 1 because grammatically incorrect nodes are eliminated from the search and the probabilities are not renormalized (22).

Supplementary Materials

  • robotics.sciencemag.org/cgi/content/full/4/37/eaay4663/DC1

    Text S1. Additional model results

    Text S2. Additional materials and methods

    Text S2.1. Model limitations

    Text S2.2. Training details of embodied haptic model

    Text S2.3. Details on tactile glove

    Text S2.4. Force visualization

    Text S2.5. Additional human experiment details

    Fig. S1. Additional generalization experiments on bottles augmented with different 3D-printed caps.

    Fig. S2. Examples of estimated trends for testing and generalization haptic data.

    Fig. S3. The confusion matrix of Δ across different bottles based on the haptic signals.

    Fig. S4. An example of action grammars and grammar prefix trees used for parsing.

    Fig. S5. An example of the GEP.

    Fig. S6. Tactile glove hardware design.

    Fig. S7. Qualitative trust question asked to human participants after observing two demonstrations of robot execution.

    Fig. S8. Prediction accuracy question asked to human participants after each segment of the robot’s action sequence during the prediction phase of the experiment.

    Table S1. Numerical results and SDs for human participant study.

    Table S2. Network architecture and parameters of the autoencoder.

    Table S3. Network architecture and parameters for robot to human embedding.

    Table S4. Network architecture and parameters for action prediction.

    Table S5. Hyperparameters used during training.

    Table S6. Specifications of the computing platform used in the experiments.

    Algorithm S1. Algorithm of the improved GEP for robot planning.

    Movie S1. Example explanation video of full model shown to human participants.

    Movie S2. Example haptic explanation video shown to human participants.

    Movie S3. Example symbolic explanation video shown to human participants.

    Movie S4. Example text explanation video shown to human participants.

    Movie S5. Example baseline video shown to human participants.

    Data S1. Final data files (.zip).

    Code S1. Software files (.zip).

  • Supplementary Materials

    The PDF file includes:

    • Text S1. Additional model results
    • Text S2. Additional materials and methods
    • Text S2.1. Model limitations
    • Text S2.2. Training details of embodied haptic model
    • Text S2.3. Details on tactile glove
    • Text S2.4. Force visualization
    • Text S2.5. Additional human experiment details
    • Fig. S1. Additional generalization experiments on bottles augmented with different 3D-printed caps.
    • Fig. S2. Examples of estimated trends for testing and generalization haptic data.
    • Fig. S3. The confusion matrix of Δ across different bottles based on the haptic signals.
    • Fig. S4. An example of action grammars and grammar prefix trees used for parsing.
    • Fig. S5. An example of the GEP.
    • Fig. S6. Tactile glove hardware design.
    • Fig. S7. Qualitative trust question asked to human participants after observing two demonstrations of robot execution.
    • Fig. S8. Prediction accuracy question asked to human participants after each segment of the robot’s action sequence during the prediction phase of the experiment.
    • Table S1. Numerical results and SDs for human participant study.
    • Table S2. Network architecture and parameters of the autoencoder.
    • Table S3. Network architecture and parameters for robot to human embedding.
    • Table S4. Network architecture and parameters for action prediction.
    • Table S5. Hyperparameters used during training.
    • Table S6. Specifications of the computing platform used in the experiments.
    • Algorithm S1. Algorithm of the improved GEP for robot planning.

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • Movie S1 (.mp4 format). Example explanation video of full model shown to human participants.
    • Movie S2 (.mp4 format). Example haptic explanation video shown to human participants.
    • Movie S3 (.mp4 format). Example symbolic explanation video shown to human participants.
    • Movie S4 (.mp4 format). Example text explanation video shown to human participants.
    • Movie S5 (.mp4 format). Example baseline video shown to human participants.
    • Data S1. Final data files (.zip).
    • Code S1. Software files (.zip).

    Files in this Data Supplement:

Stay Connected to Science Robotics

Navigate This Article