We have developed a scalable method for parallelizing sampling-based planning algorithms. The method subdivides the configuration space (C-space) into regions and uses (sequential) sampling-based planners to build roadmaps in each region. The regional roadmaps are later connected to form a roadmap of the entire free space. By subdividing the space,we reduce the work and inter-processor communication associated with nearest neighbor calculations, a critical drawback and bottleneck to scalability in previous parallel motion planning methods.

There are two main types of sampling-based planning methods, graph-based methods used when multiple queries will be performed in the same environment (e.g., Probabilistic Roadmap Methods, or PRMs) and tree-based approached most suitable for single query settings (e.g., Rapidly-explorign Random Trees, or RRTs). We show how both types of methods can be parallelized efficiently.

Parallelizing Graph-based Methods

The multi-query graph-based methods (e.g., PRMs) build a graph during preprocessing which is later used to answer many queries. The constuction of the graph consists of two main phases, node generation and node connection, both of which can be easily parallelized. To generate n random nodes in parallel, we simply ask each of the p processors to generate n/p of them. Effective parallelization of the connection phase is essential to obtain scalable speedups since this typically accounts for 98% of the sequential computation time. A simple sequential connection algorithm attempts to connect every node to its k nearest neighbors. Suppose we group the n nodes into p sets, one for each processor. Then, a simple parallel solution is for each processor to compute the k nearest neighbors for its set of nodes and attempt the connections.

Parallelizing Tree-based Methods

The single-query tree-based methods (e.g., RRTs) have been more challenging to parallelize efficiently due to the sequential nature of the tree growth. In one approach, Radial RRT, we demonstrated a scalable algorithm that subdivides the space radially into regions to increase the computation locality. However, if an obstacle completely blocks RRT growth in a region, the planning space is not covered and the method is thus not probabilistically complete. We addressed this with another version, Blind RRT, which ignores obstacles during initial growth to efficiently explore the entire space. Because obstacles are ignored, free components of the tree become disconnected and fragmented. Blind RRT merges parts of the tree that have become disconnected from the root. We show how this algorithm can be applied to the Radial RRT framework allowing both scalability and effectiveness in motion planning. This method is a probabilistically complete approach to parallel RRTs. We show that our method not only scales but also overcomes the motion planning limitations that Radial RRT has in a series of difficult motion planning tasks.

Distributed RRT Illustration
Figure 1: Example of a radial subdivision for a 2D CSpace from a root node. Each process concurrently builds a subtree from the root and attempts to connect to goal.

Results

We show that our method (pSBMP-X) is general enough to handle both graph-based (pSBMP-PRM) and tree-based (pSBMP-RRT) approaches. We compare our approach (pSBMP-PRM and pSBMP-RRT) to two other existing parallel algorithms (pPRM and pSRT) and demonstrate that our approach achieves better and more scalable performance (shown in Figure 1). Our approach achieves almost linear scalability to hundreds of cores (shown in Figure 2) on a 2400 cores Linux cluster of 13.8 tera-flop peak performance at Texas A&M University and over a thousand cores (shown in Figure 3) on a Cray XE6 petascale machine with peak performance of 1.288 peta-flops and 153,216 cores at Lawrence Berkeley National Laboratory.

Performance Comparison
Figure 2: Performance Comparison with Existing Approaches
Scalability on Linux Cluster
Figure 3: Scalability on Linux Cluster
Scalability on Cray XE6 Machine
Figure 4: Scalability on Cray XE6 Machine

Load Balancing Strategies

These strategies based on uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms can scale well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. We introduced two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We proposed two techniques to address the problems of load imbalance. The first is a bulk-synchronous redistribution technique that redistributes regions among processors so that they have a more balanced distribution of data. The method approximates the amount of work a region will requre that is based on the complexity of a region and use it to attempt to balance work across processors, while also preserving the spatial geometry of the subdivision. The second approach is an adaptive work-stealing approach that permantently migrates both the region and the work related to it to improve the load balance.

Our results show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores.

Load balance example (a) Load balance example (b)
Figure 5: When uniformly subdividing an environment as shown in our 2D example in (a), we divide along the environment's workspace degrees of freedom, with some overlap. Each region is then mapped by the Region Graph as seen in (b).
Imbalanced environment
Figure 6: An imbalanced environment and roadmap graph node distribution.
Before rebalancing (b)
(6b) Before rebalancing for parallel PRM
After rebalancing (c)
(6c) After rebalancing for parallel PRM
Measure of load imbalance (a)
Figure 7a: Experimental validation of measure of load imbalance and
Potential improvement (b)
Figure 7b: Experimental validation of potential improvement in model environment.
Evaluation (a) Evaluation (b) Evaluation (c)
Figure 8: Evaluation of (a) execution time and (b) coefficient of variation and (c) load distribution for PRM on HOPPER using med-cube.

Publications

Updated: