Deep learning can accelerate grasp-optimized motion planning

See allHide authors and affiliations

Science Robotics  18 Nov 2020:
Vol. 5, Issue 48, eabd7710
DOI: 10.1126/scirobotics.abd7710


Robots for picking in e-commerce warehouses require rapid computing of efficient and smooth robot arm motions between varying configurations. Recent results integrate grasp analysis with arm motion planning to compute optimal smooth arm motions; however, computation times on the order of tens of seconds dominate motion times. Recent advances in deep learning allow neural networks to quickly compute these motions; however, they lack the precision required to produce kinematically and dynamically feasible motions. While infeasible, the network-computed motions approximate the optimized results. The proposed method warm starts the optimization process by using the approximate motions as a starting point from which the optimizing motion planner refines to an optimized and feasible motion with few iterations. In experiments, the proposed deep learning–based warm-started optimizing motion planner reduces compute and motion time when compared to a sampling-based asymptotically optimal motion planner and an optimizing motion planner. When applied to grasp-optimized motion planning, the results suggest that deep learning can reduce the computation time by two orders of magnitude (300×), from 29 s to 80 ms, making it practical for e-commerce warehouse picking.


The Coronavirus Disease 2019 pandemic greatly increased demand for e-commerce and reduced the ability of warehouse workers to fill orders in close proximity, driving interest in robots for order fulfillment. However, despite recent advances in grasp planning [e.g., Mahler et al. (1)], the planning and executing of robot motion remain a bottleneck. To address this, in prior work, we introduced a Grasp-Optimized Motion Planner (GOMP) (2) that computes a time-optimized motion plan (see Fig. 1) subject to joint velocity and acceleration limits and allows for degrees of freedom in the pick-and-place frames (see Fig. 2). The motions that GOMP produces are fast and smooth; however, by not taking into account the motion’s jerk (change in acceleration), the robot arm will often rapidly accelerate at the beginning of each motion and rapidly decelerate at the end. In the context of continuous pick-and-place operations in a warehouse, these high-jerk motions could result in wear on the robot’s motors and reduce the overall service life of a robot. In this paper, we introduce jerk limits and find that the resulting sequential quadratic program (SQP) and its underlying quadratic program (QP) require computation on the order of tens of seconds, which is not practical for speeding up the overall pick-and-place pipeline. We then present DJ (Deep-learning Jerk-limited)–GOMP, which uses a deep neural network to learn trajectories that warm start computation, yielding a reduction in computation times from 29 s to 80 ms, making it practical for industrial use.

Fig. 1 Grasp-optimized motion planning in action.

The proposed motion planner computes a time- and jerk-optimized motion for pick-and-place operations, using a combination of deep learning and optimization. Time optimization makes the motions fast (sub-second). Jerk (change in acceleration) optimization avoids overshooting and reduces wear over long term repeated operation. For a given pair of start and end robot configurations, deep learning rapidly computes an approximation of the optimal motion that can violate motion constraints (e.g., collides with a bin, exceeds joint limits). The motion planner then feeds the approximation to an optimization process to minimize jerk and fix up the constraint violations. By using the deep-learning-based approximation, the computation time speeds up by 300×.

Fig. 2 Grasp-optimized motion planning degrees of freedom.

The optimized motion planning allows for degrees of freedom to be added to the pick and or place frames. In (A), grasp analysis produces a top-down grasp. Because the analysis is based on two contact points, the motion planner allows for rotation about the grasp contact points shown as ±60° rotations in (B) and (C). Similarly, reversing the contact points, visible in (D) as a different arm pose, will still be valid according to grasp analysis. DJ-GOMP computes an optimal rotation for pick and place frames that minimizes time and jerk of the motion.

For a given workcell environment, DJ-GOMP speeds up motion planning for a robot and a repeated task through a three-phase process. The first phase randomly samples tasks from the distribution of tasks the robot is likely to encounter and generates a time- and jerk-minimized motion plan using an SQP. The second phase trains a deep neural network using the data from the first phase to compute time-optimized motion plans for a given task specification (Fig. 3). The third phase, used in pick-and-place, uses the deep network from the second phase to generate a motion plan to warm start the SQP from the first phase. By warm starting the SQP from the deep network’s output, DJ-GOMP ensures that the motion plan meets the constraints of the robot (something the network cannot guarantee) and greatly accelerates the convergence rate of the SQP (something the SQP cannot do without a good initial approximation).

Fig. 3 A deep neural network architecture for grasp optimized motion planning.

The input is the start and goal grasp frames (A). Each “FDE” block (B) sequences a fully connected (FC) layer (C), a dropout layer (D), and an exponential linear unit (ELU) layer (E). The output (F) is a trajectory τH from the start frame to the goal frame for each of the time steps H supported by the network. A separate network uses one-hot encoding to predict which of the output trajectories is the shortest valid trajectory.

This paper describes algorithms and training process of DJ-GOMP. In Results, we perform experiments on a physical Universal Robotics UR5 manipulator arm, verifying that the trajectories GOMP generates are executable on a physical robot and result in fast and smooth motion. This paper provides the following contributions: (i) J-GOMP, an extension of GOMP that computes time-optimized jerk-limited motions for pick-and-place operations; (ii) DJ-GOMP, an extension of J-GOMP that uses deep learning of time-optimized motion plans that empirically speeds up the computation time of the J-GOMP optimization by two orders of magnitude (300×); (iii) comparison to optimally time-parameterized Probabilistic Road Maps “Star” (PRM*) and TrajOpt motion planners in compute and motion time suggesting that DJ-GOMP computes fast motions quickly; and (iv) experiments in simulation and on a physical UR5 robot suggesting that DJ-GOMP can be practical for reducing jerk to acceptable limits.


Time-optimized motion planning

We consider the problem of automating grasping and placing motions of a manipulator arm while avoiding obstacles and minimizing jerk and time. Minimizing motion time requires incorporating the robot’s velocity and acceleration limits. We cast this as an optimization problem with nonconvex constraints and compute an approximation using an SQP.

To plan a robot’s motion, we compute a trajectory τ as a sequence of configurations (q0, q1, …, qn), in which each configuration qi is the complete specification of the robot’s degrees of freedom. Of the set of all configurations C, the robot is in collision for a portion CobsC. The remainder Cfree=C\Cobs is the free space. For the motion to be valid, each configuration must be in the free space qCfree and be within the joint limits [qmin, qmax].

The motion starts with the robot’s end effector at a grasp frame g0SE(3) and ends at a place frame gHSE(3). Grasps for parallel-jaw grippers have an implied degree of freedom about the axis defined by the grasp contact points. Similarly, suction-cup grippers have one about the contact normal. The implied degrees of freedom means that the start of the motion is constrained to a setG0={gigi=Rc(θ)g0+t,θ[θmin,θmax],t[tmin,tmax]}where Rc(·) is a rotation about the free axis c, θmin and θmax bound the angle of rotation, and tmin ∈ ℝ3 and tmax ∈ ℝ3 bound the translation. The place frame may have similarly formulated, but different degrees of freedom based on packing requirements.

To be dynamically feasible, trajectories must also remain within the velocity, acceleration, and jerk limits (vmax, amax, and jmax) of the robot.

Treating τ;RC as a function of time and defining a function h;TR as the duration of the trajectory, where T is the set of all trajectories, the objective of DJ-GOMP is to computeargminτ h(τ)s.t. (such that) τ(t)[qmin,qmax]Cfree t[0,h(τ)]dτdt[Vmax,Vmax]  t[0,h(τ)]d2τdt(t)[amax,amax]  t[0,h(τ)]d3τdt(t)[jmax,jmax] t[0,h(τ)]p(τ(0))G0p(τ(h(τ)))GHwhere p;CSE(3) is the robot’s forward kinematic function to gripper frame. In addition, should multiple trajectories satisfy the above minimization, DJ-GOMP computes a trajectory that minimizes sum-of-squared jerks over time.

Computing motion plans

We propose a multistep process for computing motion plans quickly. The underlying motion planner is based on an SQP proposed in GOMP (2), which is a time-optimizing extension of TrajOpt (3) that incorporates a depth map for obstacle avoidance, degrees of freedom at pick and place points, and robot dynamic limits. In GOMP and its extensions in this work, trajectories are discretized into a sequence of H + 1 waypoints separated by a fixed time interval tstep, where tstep is a tunable parameter, and H is the computed (time) horizon of the motion (borrowing the term from receding horizon control methods). In this work, we extend the SQP in GOMP to include jerk limits and minimization to create J-GOMP, a jerk-limited motion planner. J-GOMP produces jerk-limited motion plans but at a substantially increased compute time.

To address the slow computation, we train a deep neural network to approximate J-GOMP. Because the network approximates J-GOMP, we use J-GOMP to generate a training dataset consisting of trajectories for random pick and place points likely to be encountered at runtime (e.g., from location in a picking bin to a location in a placement bin). With GPU (graphics processing unit)–based acceleration, the network can compute approximate trajectories in milliseconds. However, the network cannot guarantee that the trajectories it generates will be kinematically or dynamically feasible or avoid obstacles.

To fix the trajectories generated by the network, we propose using the network’s trajectory to warm start the SQP from J-GOMP. The warm start allows the SQP to start from a trajectory much closer to the final solution and thus allows it to converge to an optimal solution quickly. Because the SQP enforces the pick, place, kinematic, dynamic, and obstacle constraints, the resulting trajectory is valid.

Physical experiments

We tested DJ-GOMP on a physical UR5 robot (4) fitted with a Robotiq 2F-85 (5) parallel gripper. In the experiment setup (see Fig. 4), the robot must move objects from one fixed bin location to another. We set DJ-GOMP to be constrained according to the specified joint configuration and velocity limits of the UR5. We derived an acceleration limit based on the UR5’s documented torque and payload capacity, and we limited the jerk to a multiple of the computed acceleration limit. In practice, we surmise that an operator would define jerk limits by taking into account the desired service life of the robot.

Fig. 4 Physical experiment executing jerk-limited motion computed by DJ-GOMP on a UR5.

The motion starts by picking an object from the right bin (A), moves over the divider (B to D), and ends after placing the object in the left bin (E). Without the jerk limits, the motion takes 448 ms but results in a high jerk at the beginning and end of the motion, which, in this case, causes the UR5 robot to overshoot its end frame by a few millimeters. With jerk limits, the motion takes 544 ms, reduces wear, and does not overshoot the end frame.

To generate train/test data for the deep neural network, we use all 80 hardware threads of an NVIDIA DGX-1 to compute 100,000 optimized input and trajectory ([g0TgHT]T,x*) pairs, where x* is the discretized trajectory. The J-GOMP optimizer is written in C++ and uses Operator Splitting solver for Quadratic Program (OSQP) (6) as the underlying QP solver. The inputs it generates consist of random pick (t0) and place (tH) translations drawn uniformly from the pick and place physical space. For each generated translation, we also generate a top-down rotation angle (θ0 and θH) uniformly drawn from (0, π). Because a parallel gripper’s grasp has an equivalent, although kinematically different [see Fig. 2 (A and D)], grasp with a 180° rotation, for each translation + rotation grasp, we also add its rotation by 180°. Thus, for each random [t0TtHT]T pair, we add four grasp frames with rotation (θ0, θH), (θ0 + π, θH), (θ0, θH + π), (θ0 + π, θH + π) and their trajectories.

We trained the deep network with the Adadelta (7) optimizer for 50 epochs after initializing the weights using a He uniform initializer (8). The network architecture and optimization framework were written in Python using PyTorch. All training and deep network computations were accelerated by GPUs on NVIDIA DGX-1’s Tesla V100 SXM2 GPU and Intel Xeon E5-2698 v4 central processing units (CPUs).

To evaluate the ability of the deep-learning approach of DJ-GOMP to speed up motion planning, we computed 1000 random motion plans both without and with deep learning–based warm start and plot the results in Fig. 5. The median compute time without deep learning is 29.0 s. Using a network to estimate the optimal time horizon, but not the trajectory, can speed up computation significantly but at a cost of increased failure rate. Using the network to both predict the time horizon and the warm-start trajectory results in a median with deep learning of 80 ms; when compared to J-GOMP, this shows two orders of magnitude improvement, an approximate 300× speedup.

Fig. 5 Compute time distribution for 1000 random motion plans.

In these plots, the x axis shows total compute time in seconds for a single optimized trajectory. Plot (A) extends to 90 s, whereas plots (B) and (C) extend to 1 s. The y axis shows the distribution compute time required. The full optimization process without the deep-learning prediction, shown in the histogram in (A), requires orders of magnitude longer to compute. Using a deep network to predict the optimal time horizon for a trajectory, but not warm-starting the trajectory (B), leads to improvements in compute time, although with increased failures. Using the deep network to compute a trajectory to warm start the optimization (C) further improves the compute time. In (C), the plots include results for both estimated trajectory horizon H and the exact H from the full optimization to show the effect of misprediction of trajectory length—inexact predictions can result in a faster compute time because the resulting trajectory is suboptimal, thus less tightly constrained. The upper limit on the x axis is shown in red to highlight the difference in scale—plots (B) and (C) are magnified by two orders of magnitude.

To evaluate the effect on the optimality of the computed trajectories, we compared the sum-of-squared jerks between trajectories generated with the full SQP versus those generated with a warm-started prediction with the optimal horizon. We observe that more than 99% of the test trajectories are within 10−3 of each other, which is an error value that is within the tolerance bounds we set for the QP optimizer. For a small fraction (less than 1%), we observe that the warm-started optimization and the full optimization find different local minima, without clear benefit to either optimization.

Because the optimality of the trajectory and the failure rate is dependent on accurately predicting the optimal time horizon of a trajectory, we separately evaluated this prediction. We observe that shorter values of the horizon lead inevitably to SQP failures, whereas longer values lead to suboptimal trajectories. Because failures are likely to be more problematic than slighty slower trajectories, we propose a simple heuristic to predict longer horizons. When the network predicts a horizon longer than the optimal, we observe that the optimization of trajectories with suboptimal horizon can be faster than that of the optimal horizon (shown in Fig. 5B). This is likely due to the suboptimal trajectory being less constrained and thus faster to converge. In practice, we propose that using a readily available multicore CPU to simultaneous compute multiple SQPs for different horizons around the estimated horizon would be a practical way to address the failures and suboptimal trajectories. However, if constrained to a single-core computation, using a longer horizon may also be practical because the compute time saved may be more than time saved by using the optimal horizon.

To evaluate the effect on failure rate, we recorded the number of failures with both cold-started and warm-started optimization with the optimal horizon (observing that predicting short horizon is the other source of failures). Cold-started optimizations fail 10.7%, whereas warm-started optimizations fail 5.7%. These failures occur because the optimizer cannot move the trajectory into a feasible region due to the tight constraints. In experiments, the failure rate went down with additional training data and longer network training, suggesting that further improvement is possible.

We compare compute time and motion time performance to PRM* (9, 10) and TrajOpt (3). For PRM*, we precompute graphs of 10,000, 100,000, and 1,000,000 vertices over the workspace in front of the robot. Because PRM* is an asymptotically optimal motion planner, graphs with more vertices should produce shorter paths, at the expense of longer graph search time. For TrajOpt, we configure the optimization parameters to match that of DJ-GOMP, observing that this improves success rate over the default. Straight-line initialization in TrajOpt fails in this environment due to the bin wall between the start and end configurations; whereas DJ-GOMP’s specialized obstacle model moves the trajectory out of collision, TrajOpt’s obstacle model result in linearizations that do not push the trajectory out of collision. We thus initialize TrajOpt with a trajectory above the obstacles in the workspace. Because both PRM* and TrajOpt do not directly produce time-parameterized trajectories, we use Kunz et al.’s method (11) to compute time-optimal time parameterization. This time parameterization method first “rounds corners” by adding smooth rounded segments to connect the piecewise linear motion plan from PRM* before computing the optimal timing for each waypoint. Without the rounded corners, the robot would have to stop between each linear segment of the motion plan to avoid an instantaneous infinite acceleration. The radius of the corner rounding is tunable; however, rounding corners too much can result in a motion plan that collides with obstacles. This time parameterization also does not minimize or limit jerk and thus produces high jerk trajectories with peaks in the range 5 × 105 to 8 × 105 rad/s3 (Fig. 6A), meaning that they should have an advantage in motion time over jerk-limited motions (Fig. 7). As a final step, because 180° rotated parallel jaw grasps are equivalent, we compute trajectories for each pick and place combination and select the fastest motion. The results for 1000 pick-place pairs are shown in Fig. 6. We observe that PRM* has consistent fast compute times but produces the slowest trajectories. TrajOpt is slower to compute but produces faster trajectories than PRM*. DJ-GOMP, because it directly optimizes for a time-optimal path, produces the fast motions, whereas the deep-learning horizon prediction and warm start allow it to compute quickly despite complex constraints and result in the overall fastest combined compute and motion time.

Fig. 6 Maximum jerk and timing comparisons for 1000 pick-place pairs computed with PRM*, TrajOpt, and DJ-GOMP.

These graphs compare motion plan (A) jerk, (B) compute time, (C) motion time, and (D) combined compute + motion time. The filled boxes spans the first through third quartile with a horizontal line at the median. The whiskers extend from the minimum to maximum values. Paths computed by PRM* (9, 10) and TrajOpt (3) are subsequently optimally time parameterized (11). The time parameterization does not limit jerk as DJ-GOMP does, which allows for faster but high jerk motions. Even so, because DJ-GOMP directly optimizes the path, unlike PRM* and TrajOpt, DJ-GOMP generates the fastest motions; whereas its deep learning–based warm start allows for fast compute and motion times.

Fig. 7 Jerk limit’s effect on computed and executed motion.

We plot the jerk (y axis) of each joint in rad per cubic second over time in milliseconds (x axis) as computed (A) without jerk limits and (B) with jerk limits. Without jerk limits, the optimization computes trajectories with large jerks throughout the trajectory (shown in shaded regions). With jerk limits, each joint stays within the defined limits (the dotted lines) of the robot.

To evaluate whether motion plans that DJ-GOMP generates work on a physical robot, we have a UR5 follow trajectories that DJ-GOMP generates. An example motion is shown in Fig. 4. The UR5 controller does not allow the robot to exceed joint limits and issues an automated emergency stop when it does. The trajectories that DJ-GOMP generates are constrained to the documented limits and thus do not cause the stop. However, we have observed that, without jerk limits, a high-jerk trajectory can cause the UR5 to overshoot its target and bounce back. With DJ-GOMP’s jerk-limited trajectories, the UR5 empirically does not overshoot.


Experiments suggest that warm starting the J-GOMP optimizing motion planner with an approximation from deep learning can speed up motion planning with J-GOMP by two orders of magnitude, over 300×, and compute time-optimized jerk-limited trajectories with an 80-ms median compute time. The time optimization has potential to reduce picks per hour, an important metric in warehouse packing operations, whereas the jerk limits can reduce wear on robots, leading to longer lifespans and reduced downtime.

Deep network design

The design and training of the deep network that approximates the trajectories of J-GOMP have an important impact on the performance of the overall motion planning system. Trajectories that are closer to the J-GOMP solution will take fewer optimizations iterations, whereas trajectories further from the solution will take more optimization iterations. In the method we propose, we use deep network to predict both the optimal time horizon of the trajectory and the full trajectory for any horizon. In ablation studies, we tried a policy-style network that predicts an action to take based on the current state and the goal state. By feeding each state back into the network, it computes a sequence of states that form a trajectory. This network produced less stable results and resulted in longer times to produce an optimization. Although an 80-ms median compute time may be sufficient for many applications, further improvement may be possible with different design.

The choice of loss function used in the training strongly influences the warm-start speed. Although a mean squared error (MSE) loss, because it measures the difference between training data and the network’s output, should be sufficient if reduced to 0, we propose using a loss that is a weighted combination of MSE and a loss that encourages the network to produce dynamically feasible motions. Because the dynamics loss is consistent with the generated trajectories, using it did not significantly affect the reported MSE test loss but did result in trajectories that were smoother. The resulting smoothed trajectories were closer to a consistent solution and resulted in the optimizer requiring fewer SQP iterations to complete.

Training the network also benefits from a J-GOMP implementation and dataset that approaches a continuous function. Experimentally, we found that discontinuities made training the network difficult. To encourage continuity, we took the following steps: (i) The optimization smoothly varies around obstacles by performing a continuous collision detection based on the spline between waypoints, (ii) the cold-started optimizations starts from a deterministic and smoothly varying interpolation, and (iii) using the optimal trajectories with suboptimal horizons in the training dataset. We also observe that for a given start-goal pair, there can be multiple minimum time trajectories due to discretization of time. By minimizing jerk as well, J-GOMP provides a consistent mechanism for selecting a trajectory to learn.

Continuous learning

In continuous operation, a system will produce trajectories that can be used to train the deep network. When running the experiments, we found that more training data improved the predictions of the network. We hypothesize that we did not reach the limit of improvement, and continuous operation would provide a method by which additional training data can be generated. An additional benefit may come from such a feedback system. The initial training dataset that we propose is from a uniform random distribution over two volumes—the pick bin and place bin (Fig. 4). In practice, the distribution of trajectories is likely to be nonuniform, e.g., based on how objects form piles in each bin. Hence, the initial training distribution will likely be out of distribution with the system during operation, and other precomputation strategies (12) may produce a better initial results. By leveraging the data from repeated operation, the system should continue to gain data from which it can learn and thus produce better trajectories that will speed up the SQP computation.

Application to other robots and environments

We propose a system for speeding up motion planning and execution time and experimented on a UR5 robot in a pick-and-place operation. The kinematic design of this robot has favorable properties in this application and motion planning algorithm. The robot has two joints that can lift the end effector up from any configuration—with the depth map as the obstacle, this means that there will always be an obstacle-free trajectory, provided that there are a sufficient number of waypoints allocated to the trajectory to make the traversal. In addition, because of its 6-DOF (degrees of freedom) design, for any end-effector location, there exists eight analytic inverse kinematic solutions (13), allowing for rapid computation of multiple initial and final poses to seed the optimization process.

Application to robots with additional degrees of freedom would not only result in more inverse kinematic solutions but also allow the robot to have more options (in the form of different configurations) to avoid obstacles. In these cases, changes in the initial trajectory seeded to the optimization can result in the robot converging on a different homotopic path. For example, with a different obstacle environment, one seed might lead to an arm going above an obstacle, whereas a different seed would lead an arm going to the side of an obstacle. We hypothesize that this could be addressed in the proposed system by having a consistent solution to seeding a trajectory—one that results in a smooth function for the deep network to approximate.

Applications to other environments would require an additional data generation and training step specific to the new condition. In the experiments, we generated training and test datasets from the same distribution. If the test dataset were to come from a different (or held out) distribution, then the resulting covariate shift would decrease performance. In practice, however, we would generate training data from the new distribution.

Speeding up other optimized motion planners

The deep learning–based warm start of the optimization used by DJ-GOMP may also help speed up other optimizing motion planners such as TrajOpt (3), Covariant Hamiltonian Optimization for Motion Planning (CHOMP) (14), Stochastic Trajectory Optimization for Motion Planning (STOMP) (15), and Incremental Trajectory Optimization for Motion Planning (ITOMP) (16), ones based on interior-point optimization (17) and gradients (18). Many of these planners already compute solutions quickly, although with increased constraints, more complex obstacle environments, or additional way points in the descretization, they may slow down to the point where they become impractical to use without something like the deep learning–based warm start proposed in DJ-GOMP.

Integrated grasp and motion planning

In this paper, we explore speeding up the computation of jerk-limited motions for the pick-and-place task from GOMP in which both pick and place frames have an additional degree of freedom. The degree of freedom comes from the four degree–of–freedom representation commonly used by grasp analysis approaches such as Dexterity Network (Dex-Net) (1, 1921), Grasp Quality Convolutional Neural Network (GG-CNN) (22), Grasp Pose Detection (GPD) (23), or Fully Convolutional GQ-CNN (FC-GQ-CNN) (24). These data-driven methods often represent grasps using a center axis (1) or rectangle formulation (25) in the image plane (e.g., from a depth camera), which results in 4 DOF (a three-dimensional translation, plus a rotation about the camera z axis). Although we use FC-GQ-CNN (24) in experiments, we propose that many grasp analysis algorithms could be incorporated into the computation and learning process. However, on the basis of the grasp analysis software and gripper, modifications to the network design may be necessary. For example, recent work exploring additional degrees of freedom for grasps (2629) and showing that top-down grasps leave out many high quality grasps on many objects (30) may require an alternate formulation of the input to the network used for predicting the warm-start trajectory.

In future work, DJ-GOMP could be integrated with a grasp planner to optimize among multiple grasp configurations. Whether the grasp analysis method is from the first wave of grasping research based on analytic algorithms with physical models of contact dynamics and known geometry (3134), the second wave of research based on data-driven learning and neural networks (25, 3541), or the recent wave of research combining the two (1), many grasp analysis methods often have the ability to generate multiple ranked candidate grasps. With multiple forward passes using the DJ-GOMP network, grasp candidates from these methods could be rapidly weighted on the basis of their potential execution speed. This would allow the combination of grasp analysis software and DJ-GOMP to rapidly determine which grasp of multiple candidates leads to the fastest motion or to perform a cost-benefit analysis—for example, trading off some reliability of the grasp for speed of motion.

Use in other applications

We propose and experiment with an optimizing motion planning method in the context of a repeated pick-and-place scenario, in which the optimization is slowed down because of constraints on dynamics, obstacle avoidance, and degrees of freedom at pick and place points. We envision that other scenarios may also benefit from the proposed approach, including applications in manufacturing, assembly, painting, welding, inspection, robot-assisted surgery, construction, farming, and recycling. In each of these scenarios, the constraints in the optimization would need to adapt to the task, and the inputs to the system would also vary accordingly.

Opportunities for future research

In future work, we will explore expanding DJ-GOMP to additional robots performing more varied tasks that would include increased variation of start and goal configurations and in more complex environments. We will also explore additional deep-learning approaches to find better approximations of the optimization process and thus allow for faster warm starting of the final optimization step of DJ-GOMP. For systems without access to a GPU or other neural network accelerator, it may be fruitful to explore other routes to compute a warm-start trajectory, e.g., different/smaller network design, or a nearest trajectory from the training dataset (42). There may be potential for using a deep learning–based warm start to speed up constrained optimizations in other fields of robotics, e.g., grasp contact models (43), task planning (44, 45), and model predictive control (46, 47)—potentially allowing such algorithms to run at interactive rates and enabling new applications.


This section describes the methods in DJ-GOMP. Underlying DJ-GOMP is a jerk- and time-optimizing constrained motion planner based on an SQP. Because of the complexity of solving this SQP, computation time can far exceed the trajectory’s execution time. DJ-GOMP uses this SQP on a random set of pick-and-place inputs to generate training data (trajectories) to train a neural network. During pick-and-place operation, DJ-GOMP uses the neural network to compute an approximate trajectory for the given pick and place frames, which it then uses to warm start the SQP.

Jerk- and time-optimized trajectory generation

To generate a jerk- and time-optimized trajectory, DJ-GOMP extends the SQP formulated in GOMP (2). The solver for this SQP, following the method in TrajOpt (3) and summarized in Algorithm 1, starts with a discretized estimate of the trajectory τ as a sequence of H waypoints after the starting configuration, in which each waypoint represents the robot’s configuration q, velocity v, acceleration a, and jerk j at a moment in time. The waypoints are sequentially separated by tstep seconds. This discretization is collected into x(0), where the superscript represents a refinement iteration. Thusx(0)=(x0(0),x1(0),,xH(0)), where xt(k)=[qt(k)vt(k)at(k)jt(k)]

The choice of H and tstep is application specific, although in physical experiments, we set tstep to match (an integer multiple of) the control frequency of the robot, and we set H such that H · tstep is an estimate of the upper bound of the minimum trajectory time for the workspace and task input distribution.

The initial value of x(0) seeds (or warm starts) the SQP computation. Without the approximation generated by the neural network (e.g., for training data set generation), this trajectory can be initialized to all zeros. In practice, the SQP can converge faster by first computing a trajectory between inverse kinematic solutions to g0 and gH with only the convex kinematic and dynamic constraints (described below).

In each iteration k = (0,1,2, …, m) of the SQP, DJ-GOMP linearizes the nonconvex constraints of obstacles and pick-and-place locations and solves a QP of the following formx(k+1)=argminx  12xTPx+pTxs.t. Axbwhere A defines constraints enforcing the trust region, joint limits, and dynamics, and P is defined such that xTPx is a sum-of-squared jerks. To enforce the linearized nonconvex constraints, DJ-GOMP adds constrained nonnegative slack variables penalized using appropriate coefficients in p. As DJ-GOMP iterates over the SQP, it increases the penalty term exponentially, terminating on the iteration m at which x(m) meets the nonconvex constraints.

Algorithm 1: Jerk-limited Motion Plan

Require: x(0) is an initial guess of the trajectory, h + 1 is the number of waypoints in x(0), tstep is the time between each waypoint, g0 and gH are the pick and place frames, βshrink ∈ (0,1), βgrow > 1, and γ > 1

   1: μ ← initial penalty multiple

   2: ϵtrust ← initial trust region size

   3: k ← 0

   4: P, p, A, b ← linearize x(0) as a QP

   5: while μ < μmax do

   6:      x(k+1)argminx12xPx+px s.t.Axb /* warm start with x(k) */

   7:      if sufficient decrease in trajectory cost then

   8:        kk + 1                     /*accept trajectory */

   9:        ϵtrust ← ϵtrustβgrow      /* grow trust region */

   10:       A, b ← update linearization using x(k)

   11:    else

   12:    ϵtrust ← ϵtrustβshrink      /* shrink trust region */

   13:      b ← update trust region bounds only

   14:    if ϵtrust < ϵmin_trust then

   15:      μ ← γμ                       /* increase penalty */

   16:      ϵtrust ← initial trust region size

   17:      p ← update penalty in QP to match μ

   18: return x(k)

To enforce joint limits and dynamic constraints, Algorithm 1 creates a matrix A and a vector b that enforce the following linear inequalities on joint limitsqminqtqmaxvmaxvtvmaxamaxatamaxjmaxjtjmaxand the following equalities that enforce dynamic constraints between variablesqt+1=qt+tstepvt+12tstep2at+16tstep3jtvt + 1=vt+tstepat+12tstep2jtat+1=at+tstepjt

In addition, Algorithm 1 linearizes nonconvex constraints by adding slack variables to implement L1 penalties. Thus, for a nonconvex constraint gj(x) ≤ c, the algorithm adds the linearization g¯j(x) as a constraint in the formg¯j(x)μyj++μyjcwhere μ is the penalty, and the slack variables are constrained such that yj+0 and yj0.

In the QP, obstacle avoidance constraints are linearized on the basis of the waypoints of the current iteration’s trajectory (Algorithm 2). To compute these constraints, the algorithm evaluates the splineqspline(s;t)=qt+svt+12s2at+16s3jtbetween each pair of waypoints (xt, xt + 1) against a depth map of obstacles to find the time s ∈ [0, tstep) and corresponding configuration q̂t that minimizes signed distance separation from any obstacle. In this evaluation, a negative signed distance means that the configuration is in collision. The algorithm then uses this q̂t to computes a separating hyperplane in the form nTq + d = 0. The hyperplane is either the top plane of the obstacle it is penetrating or the plane that separates q̂t from the nearest obstacle (see Fig. 8). By selecting the top plane of the penetrated obstacle, this pushes the trajectory above the obstacle, which is a specialization of TrajOpt’s more general obstacle avoidance approach that is useful in bin picking. By selecting the hyperplane of the nearest obstacle, the algorithm keeps the trajectory from entering the obstacle. The linearize constraint for this point isnTĴt(k)x̂t(k+1)dnTp(x̂t(k))+nTĴt(k)x̂t(k)where Ĵt is the Jacobian of the robot’s position at q̂t. Because q̂t and Ĵt are at an interpolated state between optimization variables at xt and xt + 1, linearizing this constraint requires computing the chain rule for the JacobianĴt=Jp(q̂t)Jq(s)where Jp(q̂t) is the Jacobian of the position at configuration qt, and Jq(s) is the Jacobian of the configuration on the spline at sJq(s)=[pqtpqt+1pvtpvt+1]T=[3s2t2+2s3t3+13s2t22s3t32s2t+s3t3+ss3t2s2t]T

Fig. 8 Obstacle constraint linearization.

The constraint linearization process keeps the trajectory away from obstacles by adding constraints based on the Jacobian of the configuration at each waypoint of the accepted trajectory x(k). In this figure, the obstacle is shown from the side, the robot’s path along part of x(k) is shown in blue, and the constraints’ normal projections into Euclidean space are shown in red. For waypoints that are outside the obstacle (A), constraints keep the waypoints from entering the obstacle. For waypoints that are inside the obstacle (B), constraints push the waypoints up out of the obstacle. If the algorithm adds constraints only at waypoints as in (C), the optimization can compute trajectories that collide with obstacles and produce discontinuities between trajectories with small changes to the pick or place frame. These effects are mitigated when obstacles are inflated to account for them, but the discontinuities can lead to poor results when training the neural network. The proposed algorithm adds linearized constraints to account for collision between obstacles, leading to more consistent results shown in (D).

We observe that linearization at each waypoint will safely avoid obstacles with a sufficient buffer around obstacles (e.g., via a Minkowski difference with obstacles); however, slight variations in pick or place frames can shift the alignment of waypoints to obstacles. This shift of alignment (see Fig. 8C) can lead to solutions that vary disproportionately to small changes in input. Although this may be acceptable in operation, it can lead to data that can be difficult for a neural network to learn.

Algorithm 2: Linearize Obstacle-Avoidance Constraint

1: for t ∈ [0, H) do

2:      (nmin, dmin) ← linearize obstacle nearest to p(qt)

3:      qminqt

4:      for all s ∈ [0, tstep) do /* line search s to desired resolution */

5:        qsqt+svt+12s2at+16s3jt

6:        (ns, ds)← linearize obstacle nearest to p(qs)

7:        if nsp(qs)+ds<nminp(qmin)+dmin then /* compare signed distance */

8:           (nmin, dmin, qmin) ← (ns, ds, qs)

9:           Jq ← Jacobian of qs

10:     Jp ← Jacobian of position at qmin

11:     ĴtJpJq

12:     btdminnminp(qmin)+nminĴtxtμyt+ /* lower bound with slack yt+ */

13:     Add (nminĴtxtbt) and (yt+0) to set of linearconstraints in QP

As with GOMP, DJ-GOMP allows degrees of freedom in rotation and translation to be added to start and goal grasp frames. Adding this degree of freedom allows DJ-GOMP to take a potentially shorter path when an exact pose of the end effector is unnecessary. For example, a pick point with a parallel-jaw gripper can rotate about the axis defined by antipodal contact points (see Fig. 2), and the pick point with a suction gripper can rotate about the normal of its contact plane. Similarly, a task may allow for a place point anywhere within a bounded box. The degrees of freedom about the pick point can be optionally added as constraints that are linearized aswminJ0(k)q0(k+1)(g0p(q0(k)))+J0(k)q0(k)wminwhere q0(k) and J0(k) are the configuration and Jacobian of the first waypoint in the accepted trajectory, q0(k+1) is one of variables the QP is minimizing, and wminwmax defines the twist allowed about the pick point. Applying a similar set of constraints to gH allows degrees of freedom in the place frame as well.

The SQP establishes trust regions to constrain the optimized trajectory to be within a box with extents defined by a shrinking trust region size. Because convex constraints on dynamics enforce the relationship between configuration, velocity, and acceleration of each waypoint, we observe that trust regions only need to be defined as box bounds around one of the three for each waypoint. In experiments, we established trust regions on configurations.

Algorithm 3: Time-optimal Motion Plan

Require: g0 and gH are the start and end frames, γ > 1 is the search bisection ratio

   1: Hupper ← fixed or estimated upper limit of maximum time

   2: Hlower ← 3

   3: vupper ← ∞                         /* constraint violation */

   4: while vupper> tolerance do      /* find upper limit */

   5:      (xupper, vupper) ← call Alg. 1 with cold-start trajectory for Hupper

   6:      Hupper ← max(Hupper + 1, ⌈γ Hupper⌉)

   7: while Hlower < Hupper do        /* search for shortest H */

   8:      HminHlower + ⌊(HupperHlower)/γ⌋

   9:      (xmid, vmid) ← call Alg. 1 with warm-start trajectory xupper interpolated to Hmid

   10:    if vmid≤ tolerance then

   11:        (Hupper, xupper, vupper) ← (Hmid, xmid, vmid)

   12:    else

   13:        HlowerHmid + 1

   14: return xupper

To find the minimum time-time trajectory, J-GOMP searches for the shortest jerk-minimized trajectory that solves all constraints. This search, shown in Algorithm 3, starts with a guess of H and then performs an exponential search for the upper bound, followed by a binary search for the shortest H by repeatedly performing the SQP of Algorithm 1. The binary search warm starts each SQP with an interpolation of the trajectory of the current upper bound of H. The search ends when the upper and lower bounds of H are the same.

Deep learning of trajectories

To speed up motion planning, we add a deep neural network to the pipeline. This neural network treats the trajectory optimization process as a function fτ to approximatefτ:SE(3)×SE(3)H*×n×4where the arguments to the function are the pick and place frames, and the output is a discretized trajectory of variable length H* waypoints, each of which has a configuration, velocity, acceleration, and jerk for all n joints of the robot. We assume that the neural network f˜τ can only approximate fτ, thus f˜τ(·)=fτ(·)+E(τ) for some unknown error distribution E(τ). Hence, the output of f˜τ may not be dynamically or kinematically feasible. To address this potential issue, we use the network’s output to warm start a final pass through the SQP. This process can be thought of as polishing the output of the neural network approximation to overcome any errors in the network’s output.

In this section, we describe a proposed neural network architecture, its loss function, training and testing dataset generation, and the training process. Although we posit that a more general approximation could include the amount of pick and place degrees of freedom as inputs, for brevity, we assume that fτ and its neural network approximation are parameterized by a preset amount of pick and place degrees of freedom. In practice, it may also be appropriate to train multiple neural networks for different parameterizations of fτ.


The deep neural network architecture we propose is depicted in Fig. 3. It consists of an input layer connected through four fully connected blocks to multiple output blocks. The input layer takes in the concatenated grasp frames [g0TgHT]T. Because the optimal trajectory length H* can vary, the network has multiple output heads for each of the possible values for H*. To select which of the outputs to use, we use a separate classification network with two fully connected layers with one-hot encoding trained using a cross-entropy loss.

We refer to the horizon classification and multiple-output network as a HYDRA (Horizon Yielding Distillation through Retained Activations) network. The network yields both an optimal horizon length and the trajectory for that horizon. To train this network (detailed below), the trajectory output layers’ activation values for horizons not in the training sample are retained using a zero gradient so as to weight the contribution of the layers during backprop to the input layers.

In experiments, a neural network with a single output head was unable to produce a consistent result for predicting varied length horizons. We conjecture that the discontinuity between trajectories of different horizon lengths made it difficult to learn. In contrast, we found that a network was able to accurately learn a function for a single horizon length but was computationally and space inefficient, because each network should be able to share information about the function between the horizons. This led to the proposed design in which a single network with multiple output heads shares weights through multiple shared input layers.

Dataset generation

We propose generating a training dataset by randomly sampling start and end pairs from the likely distribution of tasks. For example, in a warehouse pick-and-place operation, the pick frames will be constrained to a volume defined by the picking bin, and the place frames will be constrained to a volume defined by the placement or packing bin. For each random input, we generate optimized trajectories for all time horizons from Hmax to the optimal H*. Although this process generates more trajectories than necessary, generating each trajectory is efficient because the optimization for a trajectory of size H warm starts from the trajectory of size H + 1. Overall, this process is efficient and, with parallelization, can quickly generate a large training dataset.

This process can also help detect whether the analysis of the maximum trajectory duration was incorrect. If all trajectories are significantly shorter than Hmax, then one may reduce the number of output heads. If any trajectory exceeds Hmax, then the number of output heads can be increased.

We also note that in the case where the initial training data do not match the operational distribution of inputs, the result may be that the neural network produces suboptimal motions that are substantially, kinematically, and dynamically infeasible. In this case, the subsequent pass through the optimization (see “Fast pipeline for trajectory generation” section) will fix these errors, although with a longer computation time. We propose, in a manner similar to DAgger (48), that trajectories from operation can be continually added to the training dataset for subsequent training or refinement of the neural network.


To train the network in a way that encourages matching the reference trajectory while keeping the output trajectory kinematically and dynamically feasible, we propose a multipart loss function ℒ. This loss function includes a weighted sum of MSE loss on the trajectory LT, a boundary loss LB, which enforces the correct start and end positions, and a dynamics loss Ldyn that enforces the dynamic feasibility of the trajectory. The MSE loss is the mean of the sum of squared differences of the two vector arguments: LMSE(a˜,a¯)=1nΣi=1n(a˜ia¯i)2. The dynamics loss attempts to mimic the convex constraints of the SQP. Given the predicted trajectories X˜=(x˜Hmin,,x˜Hmax), where x˜h=(q˜,v˜,a˜,j˜)t=0h and the ground-truth trajectories from dataset generation X¯=(x¯H*,,x¯Hmax), the loss functions areLT=αqLMSE(q˜,q¯)+αvLMSE(v˜,v¯)+αaLMSE(α˜,a¯)+αjLMSE(j˜,j¯)LB=LMSE(q˜0,q¯0)+LMSE(q˜H,q¯H)Ldyn=1hΣt=0h1q˜t+tstepv˜t+12tstep2a˜t+16tstep3j˜tq˜t+12   +1hΣt=0h1v˜t+tstepa˜t+12tstep2j˜tv˜t+12   +1hΣt=0h1a˜t+tstepj˜ta˜t+12   +1hΣt=0h11tstep(j¯t+1j¯t)1tstep(j˜t+1j˜t)2Lh=αTLTh+αBLBh+αdynLdynhwhere values of αq = 10, αv = 1, αa = 1, αj = 1, αB = 4 × 103, and αdyn = 1 were chosen empirically. This loss is combined into a single loss for the entire network by summing the losses of all horizons while multiplying by an indicator function for the horizons that are valid


Because the ground-truth trajectories for horizons shorter than H* are not defined, we must ensure that some finite data are present so that the multiplication of an indicator value of 0 results in 0 (and not NaN). Multiplying by indicator function in this way results in a zero gradient for the part of the network that is not included in the trajectory data.

During training, we observed that the network would often exhibit behavior of coadaptation in which it would learn either LT or Ldyn but not both. This showed up as a test loss for one going to small values, whereas the other remained high. To address this problem, we introduced dropout layers (49) with dropout probability pdrop = 0.5 between each fully connected layer, shown in Fig. 3. We annealed (50) pdrop to 0 over the course of the training to reduce the loss further.

Fast pipeline for trajectory generation

The end goal of this proposed motion planning pipeline is to generate feasible, time-optimized trajectories quickly. The SQP computes feasible, time-optimized trajectories but is slow when starting from scratch. The HYDRA neural network computes trajectories quickly (e.g., the forward pass on the network in the results section requires ∼1 to compute) but without guarantees on correctness. In this section, we propose combining the properties of the SQP and HYDRA into a pipeline (see Fig. 9) to get fast computation of correct trajectories by using a forward pass on the neural network to warm start the SQP.

Fig. 9 The fast motion planning pipeline.

The pipeline has three phases between input and robot execution. The first phase estimates the trajectory horizon H* by computing a forward pass of the neural network. The second phase estimates the trajectories for H* to create an initial trajectory for the SQP optimization process. The SQP then optimizes the trajectory, ensuring that it meets all joint kinematic and dynamic limits so that it can successfully execute on a robot.

The first step in the pipeline is to compute H˜*, an estimate of the optimal time horizon H*. This requires a single forward pass through the one-hot classification network. Because predicting horizons shorter than H* result in an infeasible SQP, it can be beneficial to either compute multiple SQPs around the predicted horizon or increase the horizon if the difference in the one-hot values for H˜* and H˜*+1 is within a threshold.

The second step in the pipeline is to compute x˜(0), an estimate of the time-optimal trajectory for H˜* using a forward pass through the HYDRA network.

The final step is to compute the trajectory using x˜(0) to warm start the SQP. In this step, because the warm-start trajectory is close to the final trajectory and generating a smooth training dataset is not a requirement, we can speed up the SQP process by relaxing the termination conditions to the tolerances of the robot and task, e.g., terminating when the pick point (and other constraints) is within 10−3 m of the target frame, instead of the 10−6 m used in dataset generation.

We observe that symmetry in grippers, such as found in parallel and multifinger grippers, means that multiple top-down grasps can result in the same contact points [e.g., see Fig. 2 (A and D)]. In this setting, we can use f˜H(·) to estimate optimal horizons for all the grasp configurations and quickly select the one with the lowest horizon. Experimentally, we find that breaking ties for optimal horizons using the associated one-hot values leads to faster trajectory optimization compute times.

Experimental hardware setup

The experimental workspace consists of two bins position in front of a UR5 robotic arm fitted with a Robotiq 2F-85 parallel-jaw gripper. DJ-GOMP’s network is trained on inputs in which the gripper picks from the bin in front of it and places in the bin to its right while avoiding the bin wall between the pick and place points. The pick frame allows a degree of freedom in rotation about the grasp axis on the pick point and a degree in left-right and forward-back translation (but not up-down).

Experimental procedure

We generate uniform random pick and place points from box volumes bounded by their respective bins and with random top-down rotation of 0° to 180°. For each pick-place pair, we compute a J-GOMP trajectory to all four combinations of their symmetric grasp points. The resulting dataset consists of 100,000 (input, trajectory) pairs × 4 for the symmetric grasps. With this dataset, we train the neural network. In experiments, we use a different set of 1000 random inputs from the same distribution to compare the time to compute an optimized trajectory with and without warm start. We run a subset of these results on the physical robot to spot check that the generated trajectories will run on the UR5.


Movie S1. Example of motions computed by grasp-optimized motion planning with a deep-learning warm start.


Acknowledgements: This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS “People and Robots” (CPAR) Initiative. We thank our colleagues who provided helpful feedback and suggestions, particularly A. Balakrishna and B. Thananjeyan. Funding: We were also supported by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, a NSF National Robotics Initiative Award 1734633, and in part by donations from Google and Toyota Research Institute. Author contributions: J.I. devised the method and neural network design, designed and ran the experiments, and wrote the manuscript. Y.A. designed and experimented with neural network data generation and training and edited the manuscript. V.S. designed and implemented the neural network training and edited the manuscript. K.G. supervised the project, advised the design and experiments, and edited the manuscript. Competing interests: J.I., Y.A., V.S., and K.G. are co-inventors on a patent application related to this work. Ambidextrous Robotics, a startup company commercializing algorithms for robot grasping, has no financial interest and played no role in the work presented in this paper: V.S. has worked there as a summer intern, and K.G. is part-time Chief Scientist there. Data and materials availability: All data needed to evaluate the conclusions in this paper are present in the paper. This article solely reflects the opinions and conclusions of its authors and does not reflect the views of the sponsors or their associated entities.
View Abstract

Stay Connected to Science Robotics

Navigate This Article