Research ArticleARTIFICIAL INTELLIGENCE

Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs

See allHide authors and affiliations

Science Robotics  16 Jan 2019:
Vol. 4, Issue 26, eaav3150
DOI: 10.1126/scirobotics.aav3150
  • Fig. 1 People can easily understand the concept conveyed in pairs of images, a capability that is exploited by LEGO and IKEA assembly diagrams.

    (A) People interpret the concept conveyed by these images as stacking red objects vertically on the right and green objects horizontally at the bottom. (B) Given a novel image, people can predict what the result of executing the concept would be. (C) Concepts inferred from schematic images as in (A) can be applied in real-world settings. (D) Enabling robots to understand concepts conveyed in image pairs will significantly simplify communicating tasks to robots. (E) Not all concepts conveyed as input-output pairs are as readily apparent to humans as the visual and spatial reasoning tasks.

  • Fig. 2 Architecture and the full instruction set of the VCC.

    (A) Building blocks of VCC and their interactions. VH parses an input scene into objects and can attend to objects and imagine them. The hand controller has commands for moving the hand to different locations in the scene, and the fixation controller commands position the center of the eye. Object indexing commands iterate through the objects currently attended to. The attention controller can set the current attention based on object shape or color. (B) The full instruction set of VCC. Parentheses denote instructions with arguments. All concepts are represented using learned sequences of these 24 primitive instructions.

  • Fig. 3 Concepts and their representation as cognitive programs.

    (A) Input-output examples for 15 different tabletop concepts. In our work, we tested on 546 different concepts (see the Supplementary Materials for the full list). (B) A manually written program for a concept that requires moving the central object to touch the other object. The images on the right show different stages during the execution of the program, with the corresponding line numbers indicated. The attended object is indicated by a blue outline.

  • Fig. 4 Program search and discovered programs.

    (A) Program induction searches in an exponential space represented as a tree, where each node (solid circle) is a program. Blue branches are instruction-to-instruction transition probabilities modeled using a generative model, and green branches are instruction-to-argument probabilities predicted using discriminative neural nets trained on input-output images. The probability of a program depends on the weights of the branches leading to its node. Solid red circles are programs that generated an exception in the VCC, and the green node is a correct program. (B to D) Three examples of discovered programs and visualizations of their execution steps. Digits next to the visualizations correspond to program line numbers.

  • Fig. 5 Program induction details.

    (A to C) Length distribution of induced programs for the first three E-C iterations. X axis bins correspond to program lengths (number of atomic instructions). The gray bars represent the total number of programs of that length according to a set of manually written programs comprising all concepts. (D) Distribution at the end of all iterations. (E and F) Number of induced programs for different search budgets and for different model options. (G and H) Effect of subroutines.

  • Fig. 6 Program induction and generalization.

    (A) The instruction-to-instruction transition matrix after different E-C iterations. (B) Length distribution of programs induced using order-0 versus order-1 model. (C) Training curve. Most concepts are solved with just a few examples. (D) An example showing wrongly induced programs when only three training examples from a concept are presented, where accidental patterns in the data can explain the examples. In this case, the correct concept was induced with four examples.

  • Fig. 7 Generalizing to new settings and to the real world.

    (A) Training examples and induced program corresponding to the concept “move topmost to left and blue to top.” The right columns show different test settings to which the program generalizes. The test settings are all very different from the training setting except for their conceptual content. (B) The concept in (A) executed on Baxter robot with very different objects compared with the training setting. Different stages in the execution are shown.

  • Fig. 8 Learned concepts transferred to different real-world settings.

    (A) Each row shows the starting state, an intermediate state, and the ending state for three different execution scenarios for a concept that requires stacking objects on the bottom left. The middle row shows execution on different objects, and the bottom row shows execution on a different background. (B) Execution frames from an application that separates limes from lemons. This task is achieved by the sequential composition of two concepts. (Left) The two concepts used (top and bottom) and (right) execution of these concepts in sequence to achieve the task.

Supplementary Materials

  • robotics.sciencemag.org/cgi/content/full/4/26/eaav3150/DC1

    Text

    Fig. S1. Schematic showing local bounded working memory mappings in VCC for an example program.

    Fig. S2. Features extracted from example images used for argument prediction.

    Fig. S3. Argument prediction network architecture.

    Fig. S4. Examples of valid test input images for three different concepts.

    Table S1. List of primitive functions.

    Movie S1. The concept of moving yellow objects toward the left and green objects toward the right is taught through schematic images and transferred for execution on robot to separate lemons from limes.

    Movie S2. A robot executing the concept of arranging objects in a circle under various settings.

    Movie S3. Robots executing the concept of stacking objects on the bottom left in a variety of settings.

    Movie S4. Robots executing the concept of moving the yellow object to the bottom left corner and the green object to the top right corner in a variety of settings.

    Movie S5. Robots executing the concept of stacking objects vertically in place in a variety of settings.

  • Supplementary Materials

    The PDF file includes:

    • Text
    • Fig. S1. Schematic showing local bounded working memory mappings in VCC for an example program.
    • Fig. S2. Features extracted from example images used for argument prediction.
    • Fig. S3. Argument prediction network architecture.
    • Fig. S4. Examples of valid test input images for three different concepts.
    • Table S1. List of primitive functions.

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • Movie S1 (.mp4 format). The concept of moving yellow objects toward the left and green objects toward the right is taught through schematic images and transferred for execution on robot to separate lemons from limes.
    • Movie S2 (.mp4 format). A robot executing the concept of arranging objects in a circle under various settings.
    • Movie S3 (.mp4 format). Robots executing the concept of stacking objects on the bottom left in a variety of settings.
    • Movie S4 (.mp4 format). Robots executing the concept of moving the yellow object to the bottom left corner and the green object to the top right corner in a variety of settings.
    • Movie S5 (.mp4 format). Robots executing the concept of stacking objects vertically in place in a variety of settings.

    Files in this Data Supplement:

Stay Connected to Science Robotics

Navigate This Article