ASCI Scheduler, or Task Scheduling and Deterministic Mesh Sweeps

supported by NSF, DOE
Olga Pearce, Marvin Adams, Nancy Amato
Project Alumni: Ping An

Task Scheduling | Scheduling Algorithms | STAPL Scheduler | ASCI Scheduler

Scheduling Sweeps for Particle Transport Calculations

In this work, we consider the relationship between task scheduling and deterministic mesh sweeps that arise in particle-transport computations. In particular, we argue that the efficient parallelization of such computations is most accurately viewed as a generalization of the traditional task scheduling problem and not as an application for domain decomposition, as has been generally assumed in the past. In fact, as we show, the transport problem represents an interesting composite task-scheduling problem: given one set of tasks, and multiple dependence graphs for these tasks (one for each distinguishable sweep direction), find an assignment of tasks to processors that minimizes the time required to process all such graphs. Within this context, we study and propose scheduling algorithms that are suitable for the particular generalization of the scheduling problem that arises in the context of transport sweeps.

This project is an application of STAPL. Based on STAPL Scheduler, we developed ASCI Scheduler for this particular problem.

ASCI Scheduler

ASCI scheduler is developed for the ASCI project to optimize the Particle transport Calculations on parallel systems, specificly for the scheduling of multiple dependence computation graphs emerged from the ASCI project. The problem is classified into two categories:

Parallel Sweeps on Orthogonal Grids
Parallel Sweeps on Arbitrary Grids

For general scheduling to apply to both orthogonal and unstructured grids, we define the following functionalities of the scheduler.

Scheduling

In the Scheduling phase, the unique schedule to be used for all the dependence graphs is computed. The goal of the scheduling is to find the optimal mappings of tasks onto processors, such that the total computation time of the whole sweep can be minimized.

For orthogonal grids, we have KBA, Hybrid, and Volumetric methods. There are also options for the users to define their own processor mappings. The theoretical analysis of these algorithms can be found here.

For unstructured grids, we use the heuristics implemented in STAPL Scheduler, namely wave front method, critical path method, heavy edge merge, and dominant sequence clustering, to compute the best schedule for the multiple dependence graphs.

Due to the existence of various machine topologies in supercomputing field, we adopt a hierarchical scheduling policy, such that each processor partition can be scheduled iteratively when necessary. In the following example, processor p1 can be viewed as a virtual processor which represents two physical processors p10 and p11. The hierarchical scheduling provides flexibility in mapping to different machine topologies.

Cellset Aggregation

To further optimize the computation, the scheduled computation graph is further clustered into a graph on cellsets. Each node in the graph contains a set of cells, which composes a computation block before each communication. The purpose of aggregation is to properly overlap the computation and communication, and balance the communication overhead with the cost of data transferring. The optimal cellset size is machine dependent, and can be computed based on machine parameters. There are theoretical analysis results for orthogonal cases for the optimal cellset size (description here). To get the optimal values for unstructured grids, we can use simulation results.

Graph Transformation

Based on the sweep angles, the aggregated cellset graph is transformed into different directed cellset graph in this phase. The transformation is based on the angle between the face normal of a pair of cellsets and the sweep angle. If the angle is within 90 degree, there is a directed edge between the cellsets, otherwise, there is no dependency between them.

Cell Ordering

Cell Ordering is to find out the execution order of the cells in a cellset for different sweep angles. The basic ordering uses a topological sort on the directed graph between cells.

Here is an implementation flow chart to show the major steps in the ASCI Scheduler implementation.

Related Projects

Task Scheduling
Scheduling Algorithms
STAPL Scheduler

Papers

Load Balancing Techniques for Scalable Parallelization of Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Lawrence Rauchwerger, Nancy M. Amato, Technical Report, TR13-002 , Parasol Laboratory, Department of Computer Science, Texas A&M University, Mar 2013.
Technical Report(pdf, abstract)

Task Scheduling and Parallel Mesh-Sweeps in Transport Computations, Nancy M. Amato, Ping An, Technical Report, TR00-009, Department of Computer Science and Engineering, Texas A&M University, Jan 2000.
Technical Report(ps, pdf)