See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion

See allHide authors and affiliations

Science Robotics  30 Jan 2019:
Vol. 4, Issue 26, eaav3123
DOI: 10.1126/scirobotics.aav3123
  • Fig. 1 Robot setup.

    (A) Physical setup consisting of the robot, Jenga tower, Intel RealSense D415 camera, and ATI Gamma force/torque sensor (mounted at the wrist). (B) Machine intelligence architecture with the learned physics model.

  • Fig. 2 Jenga setup in simulation and the baseline comparisons.

    (A) The simulation setup is designed to emulate the real-world implementation. (B) Learning curve of the different approaches with confidence intervals evaluated over 10 attempts. Solid lines denote the median performances; shadings denote one standard deviation. (C) Visual depiction of the structure of the MOR and the proposed approach (HMA).

  • Fig. 3 Concepts learned from exploration data.

    Means and covariances of the four clusters are projected to the space of “normal force (N),” “block rotation (rad),” and “block extraction/depth (dm).” The four clusters carry intuitive semantic meanings, and we refer to them as follows: green, “no block”; gray, “no move”; blue, “small resistance”; and yellow, “hard move.”

  • Fig. 4 Learned intuitive physics.

    (A) Overlay of the analytical friction cone and predicted forces given the current measurements. The friction coefficient between the finger material (PLA) and wood is between 0.35 and 0.5; here, we use 0.42 as an approximation. (B) Normal force applied to the tower as a function of the height of the tower. Each box plot depicts the minimum, maximum, median, and standard deviation of the force measures.

  • Fig. 5 Inference using the learned representation.

    Evolution of the beliefs of the robot as it interacts with the tower. (A) For a block that is stuck. (B) For the block that moves easily. Error bars indicate 1 SD.

  • Fig. 6 Controlled block pushing.

    The robot selects the point on the block and the appropriate angle to push with such that it realigns the block with the goal configuration. Here, the block is beginning to rotate counterclockwise and is starting to move out of the tower. The robot selects a point close to the edge of the block and pushes it back in toward the tower center. We convert angles to normalized distances by scaling with the radius of gyration of the block (0.023 m). We have exaggerated the block translation to illustrate the fine-grained details of motion.

  • Fig. 7 The vision system.

    (A) We use a 280-by-280 patch with the tower at the center. For clarity, in this figure, we crop the irrelevant regions on both sides. (B) Segmented blocks. (C) For a single push, we identify the block being pushed and, based on its associated segment, infer the underlying pose and position of the block using an HMM.

  • Table 1 Summary statistics for exploration and learned physics.

    A comparison of the performances of the robot using the exploration strategy and the learned model.

    Block positionActionExplorationLearned
    Push403172 (42.7%)20396 (45.8%)
    AllExtract17297 (56.4%)9382 (88.2%)
    Place9785 (87.6%)8272 (87.8%)
    Push288122 (42.4%)13369 (51.9%)
    SideExtract12252 (42.6%)6954 (78.3%)
    Place5244 (84.6%)5449 (90.7%)
    Push11550 (43.5%)7033 (47.1%)
    MiddleExtract5045 (90.0%)3328 (84.8%)
    Place4541 (91.1%)2823 (82.1%)

Supplementary Materials

  • Supplementary Materials

    The PDF file includes:

    • Additional Materials and Methods

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    Files in this Data Supplement:

Stay Connected to Science Robotics

Navigate This Article