Publications | Software & Systems Group
Publications

 

This page lists publications by the Parasol Laboratory. Please note that the copyrights of the papers with full citations have been signed over to the publishers.



Author Name: Between years: and

Keywords: Search shall contain:


Publications

Avoidance Critical Probabilistic Roadmaps for Motion Planning in Dynamic Environments, Felipe Felix Arias, Brian Ichter, Aleksandra Faust, Nancy M. Amato, IEEE International Conference on Robotics and Automation (ICRA), Xien, China, Jun 2021. DOI: Published
Keywords: Machine Learning, Mobile Robots, Multi-Agent
Links : [Published] [Video]

BibTex

@inproceedings{50155,
title = {Avoidance Critical Probabilistic Roadmaps for Motion Planning in Dynamic Environments},
author = {Felipe Felix Arias and Brian Andrew Ichter and Aleksandra Faust and Nancy M. Amato},
year = {2021},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}
}


Abstract

Motion planning among dynamic obstacles is an essential capability towards navigation in the real-world. Sampling-based motion planning algorithms find solutions by approximating the robot’s configuration space through a graph representation, predicting or computing obstacles’ trajectories, and finding feasible paths via a pathfinding algorithm. In this work, we seek to improve the performance of these subproblems by identifying regions critical to dynamic environment navi- gation and leveraging them to construct sparse probabilistic roadmaps. Motion planning and pathfinding algorithms should allow robots to prevent encounters with obstacles, irrespective of their trajectories, by being conscious of spatial context cues such as the location of chokepoints (e.g., doorways). Thus, we propose a self-supervised methodology for learning to identify regions frequently used for obstacle avoidance from local environment features. As an application of this concept, we leverage a neural network to generate hierarchical probabilistic roadmaps termed Avoidance Critical Probabilistic Roadmaps (ACPRM). These roadmaps contain motion structures that enable efficient obstacle avoidance, reduce the search and planning space, and increase a roadmap’s reusability and coverage. ACPRMs are demonstrated to achieve up to five orders of magnitude improvement over grid-sampling in the multi-agent setting and up to ten orders of magnitude over a competitive baseline in the multi-query setting.


Representation-Optimal Multi-Robot Motion Planning using Conflict-Based Search, Irving Solis, James Motes, Read Sandström, Nancy M. Amato, IEEE Robotics and Automation Letters, Mar 2021. DOI: NA
Keywords: Motion Planning, Multi-Agent
Links :

BibTex

@article{solis2019representation,
title={Representation-optimal multi-robot motion planning using conflict-based search},
author={Solis, Irving and Sandstr{\"o}m, Read and Motes, James and Amato, Nancy M},
journal={arXiv preprint arXiv:1909.13352},
year={2019}
}


Abstract

Multi-Agent Motion Planning (MAMP) is the problem of computing feasible paths for a set of agents each with individual start and goal states within a continuous state space. Existing approaches can be split into coupled methods which provide optimal solutions but struggle with scalability or decoupled methods which provide scalable solutions but offer no optimality guarantees. Recent work has explored hybrid approaches that leverage the advantages of both coupled and decoupled approaches in an easier discrete subproblem, Multi-Agent Pathfinding (MAPF). In this work, we adapt recent developments in hybrid MAPF to the continuous domain of MAMP. We demonstrate the scalability of our method to manage groups of up to 32 agents, demonstrate the ability to handle up to 8 high-DOF manipulators, and plan for heterogeneous teams. In all scenarios, our approach plans significantly faster while providing higher quality solutions.


Representation-Optimal Multi-Robot Motion Planning using Conflict-Based Search, Irving Solis, James Motes, Read Sandström, Nancy M. Amato, IEEE Robotics and Automation Letters, Mar 2021. DOI: NA
Keywords: Motion Planning, Multi-Agent
Links :

BibTex

@article{solis2019representation,
title={Representation-optimal multi-robot motion planning using conflict-based search},
author={Solis, Irving and Sandstr{\"o}m, Read and Motes, James and Amato, Nancy M},
journal={arXiv preprint arXiv:1909.13352},
year={2019}
}


Abstract

Multi-Agent Motion Planning (MAMP) is the problem of computing feasible paths for a set of agents each with individual start and goal states within a continuous state space. Existing approaches can be split into coupled methods which provide optimal solutions but struggle with scalability or decoupled methods which provide scalable solutions but offer no optimality guarantees. Recent work has explored hybrid approaches that leverage the advantages of both coupled and decoupled approaches in an easier discrete subproblem, Multi-Agent Pathfinding (MAPF). In this work, we adapt recent developments in hybrid MAPF to the continuous domain of MAMP. We demonstrate the scalability of our method to manage groups of up to 32 agents, demonstrate the ability to handle up to 8 high-DOF manipulators, and plan for heterogeneous teams. In all scenarios, our approach plans significantly faster while providing higher quality solutions.


Topology-Guided Roadmap Construction with Dynamic Region Sampling, Read Sandström, Diane Uwacu, Jory Denny, Nancy M. Amato, IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 4,, pp. 6161-6168, virtual, Oct 2020. DOI: 10.1109/LRA.2020.3010487
Keywords: Sampling-Based Motion Planning, Workspace Topology
Links : [Published]

BibTex

@ARTICLE{9144398,
author={R. {Sandström} and D. {Uwacu} and J. {Denny} and N. M. {Amato}},
journal={IEEE Robotics and Automation Letters},
title={Topology-Guided Roadmap Construction With Dynamic Region Sampling},
year={2020},
volume={5},
number={4},
pages={6161-6168},}


Abstract

Many types of planning problems require discovery of multiple pathways through the environment, such as multi-robot coordination or protein ligand binding. The Probabilistic Roadmap (PRM) algorithm is a powerful tool for this case, but often cannot efficiently connect the roadmap in the presence of narrow passages. In this letter, we present a guidance mechanism that encourages the rapid construction of well-connected roadmaps with PRM methods. We leverage a topological skeleton of the workspace to track the algorithm's progress in both covering and connecting distinct neighborhoods, and employ this information to focus computation on the uncovered and unconnected regions. We demonstrate how this guidance improves PRM's efficiency in building a roadmap that can answer multiple queries in both robotics and protein ligand binding applications.


Using Guided Motion Planning to Study Binding Site Accessibility, Diane Uwacu, Abigail Ren, Shawna Thomas, Nancy M. Amato, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Issue: 109, (Virtual) New York, USA, Sep 2020. DOI: 10.1145/3388440.3414707
Keywords: Computational Biology, Ligand Binding, Motion Planning
Links : [Published] [Manuscript]

BibTex

@inbook{10.1145/3388440.3414707,
author = {Uwacu, Diane and Ren, Abigail and Thomas, Shawna and Amato, Nancy M.},
title = {Using Guided Motion Planning to Study Binding Site Accessibility},
year = {2020},
isbn = {9781450379649},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3388440.3414707},
abstract = {Computational methods are commonly used to predict protein-ligand interactions. These methods typically search for regions with favorable energy that geometrically fit the ligand, and then rank them as potential binding sites. While this general strategy can provide good predictions in some cases, it does not do well when the binding site is not accessible to the ligand. In addition, recent research has shown that in some cases protein access tunnels play a major role in the activity and stability of the protein's binding interactions. Hence, to fully understand the binding behavior of such proteins, it is imperative to identify and study their access tunnels. In this work, we present a motion planning algorithm that scores protein binding site accessibility for a particular ligand. This method can be used to screen ligand candidates for a protein by eliminating those that cannot access the binding site. This method was tested on two case studies to analyze effects of modifying a protein's access tunnels to increase activity and/or stability as well as study how a ligand inhibitor blocks access to the protein binding site.},
booktitle = {Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics},
articleno = {109},
numpages = {10}
}




Abstract

Computational methods are commonly used to predict protein-ligand interactions. These methods typically search for regions with favorable energy that geometrically fit the ligand, and then rank them as potential binding sites. While this general strategy can provide good predictions in some cases, it does not do well when the binding site is not accessible to the ligand. In addition, recent research has shown that in some cases protein access tunnels play a major role in the activity and stability of the protein's binding interactions. Hence, to fully understand the binding behavior of such proteins, it is imperative to identify and study their access tunnels. In this work, we present a motion planning algorithm that scores protein binding site accessibility for a particular ligand. This method can be used to screen ligand candidates for a protein by eliminating those that cannot access the binding site. This method was tested on two case studies to analyze effects of modifying a protein's access tunnels to increase activity and/or stability as well as study how a ligand inhibitor blocks access to the protein binding site.


Dynamic Region-biased Rapidly-exploring Random Trees, Jory Denny and Read Sandstrom and Andrew Bregger and Nancy M. Amato, Algorithmic Foundations of Robotics XII. Springer Proceedings in Advanced Robotics (SPAR). The 2016 Workshop on the Algorithmic Foundations of Robotics (WAFR), Vol: 13, pp. 640-655, May 2020. DOI: 10.1007/978-3-030-43089-4_41
Keywords: Sampling-Based Motion Planning, Workspace Topology
Links : [Published]

BibTex

@article{denny-wafr-2016
, author = "Jory Denny and Read Sandstrom and Andrew Bregger and Nancy M. Amato"
, title = "Dynamic Region-biased Rapidly-exploring Random Trees"
, journal = "Algorithmic Foundations of Robotics XII. Springer Proceedings in Advanced Robotics (SPAR)"
, volume = "13"
, publisher = "Springer, Cham."
, pages = "640-655"
, year = "2020"
, doi = "10.1007/978-3-030-43089-4\_41"
, note = "Presented at the 2016 Workshop on the Algorithmic Foundations of Robotics (WAFR)."
}


Abstract

Current state-of-the-art motion planners rely on samplingbased planning to explore the problem space for a solution. However, sampling valid configurations in narrow or cluttered workspaces remains a challenge. If a valid path for the robot correlates to a path in the workspace, then the planning process can employ a representation of the workspace that captures its salient topological features. Prior approaches have investigated exploiting geometric decompositions of the workspace to bias sampling; while beneficial in some environments, complex narrow passages remain challenging to navigate. In this work, we present Dynamic Region-biased RRT, a novel samplingbased planner that guides the exploration of a Rapidly-exploring Random Tree (RRT) by moving sampling regions along an embedded graph that captures the workspace topology. These sampling regions are dynamically created, manipulated, and destroyed to greedily bias sampling through unexplored passages that lead to the goal. We show that our approach reduces online planning time compared with related methods on a set of maze-like problems.


Fast Collision Detection for Motion Planning Using Shape Primitive Skeletons, Mukulika Ghosh and Shawna Thomas and Nancy M. Amato, Algorithmic Foundations of Robotics XIII. Springer Proceedings in Advanced Robotics (SPAR). The 2018 Workshop on the Algorithmic Foundations of Robotics (WAFR), Vol: 14, pp. 36-51, Springer, Cham, May 2020. DOI: 10.1007/978-3-030-44051-0_3
Keywords: Collision Detection, Computational Geometry, Motion Planning
Links : [Published]

BibTex

@article{ghosh-wafr-2018
, author = "Mukulika Ghosh and Shawna Thomas and Nancy M. Amato"
, title = "Fast Collision Detection for Motion Planning Using Shape Primitive Skeletons"
, journal = "Algorithmic Foundations of Robotics XIII. Springer Proceedings in Advanced Robotics (SPAR)"
, publisher = "Springer, Cham."
, volume = "14"
, pages = "36-51"
, year = "2020"
, doi = "10.1007/978-3-030-44051-0_3"
}


Abstract

In many robotics applications, the environment (robots and obstacles) often have very complex geometries. These result in expensive primitive computations such as collision detection which in turn, affect the overall performance of these applications. Approximating the geometry is a common approach to optimize computation. Unlike other applications of geometric approximation where it is applied to one space (usually obstacle space), we approximate both obstacle and free workspace with a set of geometric shape primitives that are completely contained within the space and represent its topology (skeleton). We use these “shape primitive skeletons” to improve collision detection performance in motion planning algorithms. Our results show that the use of shape primitive skeletons improves the performance of standard collision detection methods in motion planning problems by 20–70% in our 2D and 3D test environments regardless of motion planning strategy. We also show how the same shape primitive skeletons can be used with robots of different sizes to improve the performance of collision detection operation.


Provably Optimal Parallel Transport Sweeps on Semi-Structured Grids, Michael P. Adams, Marvin L. Adams, W. Daryl Hawkins, Timmie G. Smith, Lawrence Rauchwerger, Nancy M. Amato, Teresa S. Bailey, Robert D. Falgout, Adam Kunen, Peter Brown, Journal of Computational Physics, Vol: 407, Apr 2020. DOI: 10.1016/j.jcp.2020.109234
Keywords: Parallel Algorithms, Scientific Computing
Links : [Published]

BibTex

@article{DBLP:journals/jcphy/AdamsAHSRABFKB20,
author = {Michael P. Adams and
Marvin L. Adams and
W. Daryl Hawkins and
Timmie G. Smith and
Lawrence Rauchwerger and
Nancy M. Amato and
Teresa S. Bailey and
Robert D. Falgout and
Adam Kunen and
Peter Brown},
title = {Provably optimal parallel transport sweeps on semi-structured grids},
journal = {J. Comput. Phys.},
volume = {407},
pages = {109234},
year = {2020},
url = {https://doi.org/10.1016/j.jcp.2020.109234},
doi = {10.1016/j.jcp.2020.109234},
timestamp = {Fri, 27 Mar 2020 14:16:32 +0100},
biburl = {https://dblp.org/rec/journals/jcphy/AdamsAHSRABFKB20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on a class of grids in 2D and 3D Cartesian geometry that are regular at a coarse level but arbitrary within the coarse blocks. We describe these algorithms and show that they always execute the full eight-octant (or four-quadrant if 2D) sweep in the minimum possible number of stages for a given Px x Py x Pz partitioning. Computational results confirm that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. Our PDT transport code has achieved approximately 68% parallel efficiency with > 1.5M parallel threads, relative to 8 threads, on a simple weak-scaling problem with only three energy groups, 10 direction per octant, and 4096 cells/core. We demonstrate similar efficiencies on a much more realistic set of nuclear-reactor test problems, with unstructured meshes that resolve fine geometric details. These results demonstrate that discrete-ordinates transport sweeps can be executed with high efficiency using more than 106 parallel processes.


Multi-Robot Task and Motion Planning with Subtask Dependencies, James Motes, Read Sandstrom, Hannah Lee, Shawna Thomas, Nancy M. Amato, IEEE Robotics and Automation Letters (RA-L), Vol: 5, Issue: 2, pp. 3338-3345, Feb 2020. DOI: 10.1109/LRA.2020.2976329
Keywords: Motion Planning, Multi-Agent, Task Planning
Links : [Published]

BibTex

@article{motes2020multi,
title={Multi-Robot Task and Motion Planning With Subtask Dependencies},
author={Motes, James and Sandstr{\"o}m, Read and Lee, Hannah and Thomas, Shawna and Amato, Nancy M},
journal={IEEE Robotics and Automation Letters},
volume={5},
number={2},
pages={3338--3345},
year={2020},
publisher={IEEE}
}


Abstract

We present a multi-robot integrated task and motion method capable of handling sequential subtask dependencies within multiply decomposable tasks. We map the multi-robot pathfinding method, Conflict Based Search, to task planning and integrate this with motion planning to create TMP-CBS. TMP-CBS couples task decomposition, allocation, and planning to support cases where the optimal solution depends on robot availability and inter-team conflict avoidance. We show improved planning time for simpler task sets and generate optimal solutions w.r.t. the state space representation for a broader range of problems than prior methods.


Interaction Templates for Multi-Robot Systems, James Motes, Read Sandstrom, Will Adams, Tobi Ogunyale, Shawna Thomas, Nancy M. Amato, IEEE Robotics and Automation Letters, Vol: 4, Issue: 3, pp. 2926-2933, Jun 2019. DOI: 10.1109/LRA.2019.2923386
Keywords: Interaction, Multi-Agent, Task Planning
Links : [Published]

BibTex

@article{motes2019interaction,
title={Interaction templates for multi-robot systems},
author={Motes, James and Sandstr{\"o}m, Read and Adams, Will and Ogunyale, Tobi and Thomas, Shawna and Amato, Nancy M},
journal={IEEE Robotics and Automation Letters},
volume={4},
number={3},
pages={2926--2933},
year={2019},
publisher={IEEE}
}


Abstract

This letter describes a framework for multi-robot problems that require or utilize interactions between robots. Solutions consider interactions on a motion planning level to determine the feasibility and cost of the multi-robot team solution. Modeling these problems with current integrated task and motion planning (TMP) approaches typically requires reasoning about the possible interactions and checking many of the possible robot combinations when searching for a solution. We present a multi-robot planning method called Interaction Templates (ITs), which moves certain types of robot interactions from the task planner to the motion planner. ITs model interactions between a set of robots with a small roadmap. This roadmap is then tiled into the environment and connected to the robots’ individual roadmaps. The resulting combined roadmap allows interactions to be considered by the motion planner. We apply ITs to homogeneous and heterogeneous robot teams under both required and optional cooperation scenarios, which previously required a task planning method. We show improved performance over a current TMP planning approach.


Feasibility Study of Robotic Needles with a Rotational Tip-Joint and Notch Patterns, Shivanand Pattanshetti, Read Sandström, Abhishek Kottala, Nancy M. Amato, Seok Chang Ryu, Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 1534-1540, Montreal, QC, Canada, Canada, May 2019. DOI: 10.1109/ICRA.2019.8793574
Keywords: Medical Robotics, Motion Planning
Links : [Published]

BibTex

@inproceedings{pattanshetti2019feasibility,
title={Feasibility Study of Robotic Needles with a Rotational Tip-Joint and Notch Patterns},
author={Pattanshetti, Shivanand and Sandstr{\"o}m, Read and Kottala, Abhishek and Amato, Nancy M and Ryu, Seok Chang},
booktitle={2019 International Conference on Robotics and Automation (ICRA)},
pages={1534--1540},
year={2019},
organization={IEEE}
}


Abstract

In this paper, we present the design of a steerable needle with proximal notch patterns for compliance and an embedded rotational tip joint for articulation. The device is fabricated by laser machining NiTi tube so that an inner working channel exists (to enable delivery of fluids, drugs or microtools) and no assembly is required for the joints. We formulate its model based on the classical Cosserat Rod theory. This is extended with incremental state prediction and a simple spring model for tissue reaction to integrate into a planning algorithm based on Dynamic Region RRT which efficiently explores the needle's state space. The planner was initialized with a target zone and arbitrary anatomical obstacles before running simulations which propagated incremental state changes at every step while adhering to constraints based on the physical system. Finally, we demonstrate the steering capability of the needle through insertion tests into a phantom.


Using Motion Planning to Evaluate Protein Binding Site Accessibility, Diane Uwacu, Everett Yang, Shawna Thomas, Nancy M. Amato, Technical Report, TR18-001, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station TX 77848, USA, Jul 2018.
Keywords: Ligand Binding, Motion Planning, Workspace Topology
Links : [Manuscript]

BibTex

N/A


Abstract

Despite many efforts and considerable breakthroughs in ligand binding prediction, the best predictors still produce many false positives and a reliable, fully automated prediction framework has yet to be developed. Binding site accessibility is an important feature ignored by methods that classify binding based solely on the energetic or geometric properties of the final bound protein-ligand complex. To evaluate this necessity, we transform the ligand accessibility problem into a robot motion planning problem where the ligand is modeled as a flexible agent whose task is to travel from outside the protein to its binding site. We use Rapidly-exploring Random Graphs coupled with Mean Curve workspace skeletons to quickly and thoroughly explore a protein environment in order to produce valid paths for ligand motion. Path weights reflect the influences of intermolecular forces on the given ligand. Low weight paths are extracted and analyzed for characteristics of accessibility. In this paper, we show that our algorithm provides a mechanism to evaluate binding site accessibility for a ligand.


Sampling-based motion planning with reachable volumes for high-degree-of-freedom manipulators, Troy McMahon, Shawna Thomas, and Nancy M Amato, The International Journal of Robotics Research, Vol: 37, Issue: 7, pp. 779-817, Jun 2018. DOI: 10.1177/0278364918779555
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{mcmahon2018sampling,
title={Sampling-based motion planning with reachable volumes for high-degree-of-freedom manipulators},
author={McMahon, Troy and Thomas, Shawna and Amato, Nancy M},
journal={The International Journal of Robotics Research},
volume={37},
number={7},
pages={779--817},
year={2018},
publisher={SAGE Publications Sage UK: London, England}
}


Abstract

Motion planning for constrained systems is a version of the motion planning problem in which the motion of a robot is limited by constraints. For example, one can require that a humanoid robot such as a PR2 remain upright by constraining its torso to be above its base or require that an object such as a bucket of water remain upright by constraining the vertices of the object to be parallel to the robot’s base. Grasping can be modeled by requiring that the end effectors of the robot be located at specified handle positions. Constraints might require that the robot remain in contact with a surface, or that certain joints of the robot remain in contact with each other (e.g., closed chains). Such problems are particularly difficult because the constraints form a manifold in C-space, and planning must be restricted to this manifold. High-degree-of-freedom motion planning and motion planning for constrained systems has applications in parallel robotics, grasping and manipulation, computational biology and molecular simulations, and animation. We introduce a new concept, reachable volumes, that are a geometric representation of the regions the joints and end effectors of a robot can reach, and use it to define a new planning space called RV-space where all points automatically satisfy a problem’s constraints. Visualizations of reachable volumes can enable operators to see the regions of workspace that different parts of the robot can reach. Samples and paths generated in RV-space naturally conform to constraints, making planning for constrained systems no more difficult than planning for unconstrained systems. Consequently, constrained motion planning problems that were previously difficult or unsolvable become manageable and in many cases trivial. We introduce tools and techniques to extend the state-of-the-art sampling-based motion planning algorithms to RV-space. We define a reachable volumes sampler, a reachable volumes local planner, and a reachable volumes distance metric. We showcase the effectiveness of RV-space by applying these tools to motion planning problems for robots with constraints on the end effectors and/or internal joints of the robot. We show that RV-based planners are more efficient than existing methods, particularly for higher-dimensional problems, solving problems with 1000 or more degrees of freedom for multi-loop and tree-like linkages.


Topological Nearest-Neighbor Filtering for Sampling-Based Planners, Read Sandstrom and Andrew Bregger and Ben Smith and Shawna Thomas and Nancy M. Amato, Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 3053-3060, Brisbane, Australia, May 2018. DOI: 10.1109/ICRA.2018.8460896
Keywords: Sampling-Based Motion Planning, Workspace Topology
Links : [Published]

BibTex

@inproceedings{sandstrom-icra-2018
, author = "Read Sandstrom and Andrew Bregger and Ben Smith and Shawna Thomas and Nancy M. Amato"
, title = "Topological Nearest-Neighbor Filtering for Sampling-Based Planners"
, booktitle = "Proc. IEEE Int. Conf. Robot. Autom. (ICRA)"
, location = "Brisbane, Australia"
, month = "May"
, year = "2018"
, pages = "3053-3060"
, doi = "10.1109/ICRA.2018.8460896"
}


Abstract

Nearest-neighbor finding is a major bottleneck for sampling-based motion planning algorithms. The cost of finding nearest neighbors grows with the size of the roadmap, leading to significant slowdowns for problems which require many configurations to find a solution. Prior work has investigated relieving this pressure with quicker computational techniques, such as kd-trees or locality-sensitive hashing. In this work, we investigate an alternative direction for expediting this process based on workspace connectivity. We present an algorithm called Topological Nearest-Neighbor Filtering, which employs a workspace decomposition to select a topologically relevant set of candidate neighbor configurations as a pre-processing step for a nearest-neighbor algorithm. We investigate the application of this filter to several varieties of RRT and demonstrate that the filter improves both nearest-neighbor time and overall planning performance.


A General and Flexible Search Framework for Disassembly Planning, Timothy Ebinger, Sascha Kaden, Shawna Thomas, Robert Andre, Nancy M. Amato, and Ulrike Thomas, In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 1-8, Brisbane, Australia, May 2018. DOI: 10.1109/ICRA.2018.8460483
Keywords: Assembly, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{ebinger2018general,
title={A General and Flexible Search Framework for Disassembly Planning},
author={Ebinger, Timothy and Kaden, Sascha and Thomas, Shawna and Andre, Robert and Amato, Nancy M and Thomas, Ulrike},
booktitle={2018 IEEE International Conference on Robotics and Automation (ICRA)},
pages={1--8},
year={2018},
organization={IEEE}
}


Abstract

We present a new general framework for disassembly sequence planning. This framework is versatile allowing different types of search schemes (exhaustive vs. preemptive), various part separation techniques, and the ability to group parts, or not, into subassemblies to improve the solution efficiency and parallelism. This enables a truly hierarchical approach to disassembly sequence planning. We demonstrate two different search strategies using this framework that can either yield a single solution quickly or provide a spectrum of solutions from which an optimal may be selected. We also develop a method for subassembly identification based on collision information. Our results show improved performance over an iterative motion planning based method for finding a single solution and greater functionality through hierarchical planning and optimal solution search.


A General Region-Based Framework for Collaborative Planning, Jory Denny, Read Sandstrom, Nancy M. Amato, In Proc. Inter. Symp. on Robotics Research (ISRR 2015). Springer Proceedings in Advanced Robotics, Vol: 3, pp. 563-579, Genova, Italy, Jan 2018. DOI: 10.1007/978-3-319-60916-4_32
Keywords: Collaborative Planning, Motion Planning, Workspace Topology
Links : [Published]

BibTex

@Inbook{Denny2018,
author="Denny, Jory
and Sandstr{\"o}m, Read
and Amato, Nancy M.",
editor="Bicchi, Antonio
and Burgard, Wolfram",
title="A General Region-Based Framework for Collaborative Planning",
bookTitle="Robotics Research: Volume 2",
year="2018",
publisher="Springer International Publishing",
address="Cham",
pages="563--579",
abstract="Sampling-based planning is a common method for solving motion planning problems. However, this paradigm falters in difficult scenarios, such as narrow passages. In contrast, humans can frequently identify these challenges and can sometimes propose an approximate solution. A recent method called Region Steering takes advantage of this intuition by allowing a user to define regions in the workspace to weight the search space for probabilistic roadmap planners. In this work, we extend Region Steering into a generalized Region-Based framework that is suitable for any sampling-based planning approach. We explore three variants of our framework for graph-based, tree-based, and hybrid planning methods. We evaluate these variants in simulations as a proof of concept. Our results demonstrate the benefits of our framework in reducing overall planning time.",
isbn="978-3-319-60916-4",
doi="10.1007/978-3-319-60916-4_32",
url="https://doi.org/10.1007/978-3-319-60916-4_32"
}


Abstract

Sampling-based planning is a common method for solving motion planning problems. However, this paradigm falters in difficult scenarios, such as narrow passages. In contrast, humans can frequently identify these challenges and can sometimes propose an approximate solution. A recent method called Region Steering takes advantage of this intuition by allowing a user to define regions in the workspace to weight the search space for probabilistic roadmap planners. In this work, we extend Region Steering into a generalized Region-Based framework that is suitable for any sampling-based planning approach. We explore three variants of our framework for graph-based, tree-based, and hybrid planning methods. We evaluate these variants in simulations as a proof of concept. Our results demonstrate the benefits of our framework in reducing overall planning time.


Bounded Asynchrony and Nested Parallelism for Scalable Graph Processing, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. Supercomputing (SC), Doctoral Showcase Poster, Denver, CO, Nov 2017.
Keywords: Approximate Algorithms, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

N/A


Abstract

Processing large-scale graphs has become a critical component in a variety of fields, from scientific computing to social analytics. The irregular access pattern for graph workloads, coupled with complex graph structures and large data sizes makes efficiently executing parallel graph workloads challenging. In this dissertation, we develop two broad techniques for improving the performance of general parallel graph algorithms: bounded asynchrony and nested parallelism. Increasing asynchrony in a bounded manner allows one to avoid costly global synchronization at scale, while still avoiding the penalty of unbounded asynchrony including redundant work. Using this technique, we scale a BFS workload to 98,304 cores where traditional methods stop scaling. Additionally, asynchrony enables a new family of approximate algorithms for applications tolerant to fixed amounts of error. Representing graph algorithms in a nested parallel manner enables the full use of available parallelism inherent in graph algorithms, while efficiently managing communication through nested sections.


Manipulation Planning with Directed Reachable Volumes, Troy McMahon, Read Sandstrom, Shawna Thomas, Nancy M. Amato, Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 4026-4033, Vancouver, Canada, Sep 2017. DOI: 10.1109/IROS.2017.8206257
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{mcmahon2017manipulation,
title={Manipulation planning with directed reachable volumes},
author={McMahon, Troy and Sandstr{\"o}m, Read and Thomas, Shawna and Amato, Nancy M},
booktitle={2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages={4026--4033},
year={2017},
organization={IEEE}
}


Abstract

Motion planning for manipulators with rotational joints is challenging because the actuation range for each link is constrained by the placement and orientation of other links. Thus, finding paths that avoid self-collision is non-trivial. However, rotational joints are often used in industrial robots. We develop a reparameterization of the planning problem called directed reachable volumes that provides an explicit representation of the workspace regions that the joints and end effectors can reach given the placement and orientation of other links. This formulation, while similar in spirit to prior reachable volume work, does not rely on the same restrictive assumptions that preclude prior work from handling rotational joints. We provide primitive planning operations that can be used in the context of state-of-the-art motion planning methods. We present experimental validation of directed reachable volumes by demonstrating a simulated pick-and-place scenario using realistic robots with rotational joints.


Fast Approximate Distance Queries in Unweighted Graphs using Bounded Asynchrony, Adam Fidel, Francisco Coral, Colton Riedel, Nancy M. Amato, Lawrence Rauchwerger, Workshop on Languages and Compilers for Parallel Computing (LCPC 2016). Lecture Notes in Computer Science, vol 10136. Springer, Cham., pp. 40-54, Rochester NY, USA, Jan 2017. DOI: 10.1007/978-3-319-52709-3_4
Keywords: Approximate Algorithms, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@inproceedings{
author="Adam Fidel and Francisco Coral Sabido and Colton Riedel and Nancy M. Amato and Lawrence Rauchwerger",
title="Fast Approximate Distance Queries in Unweighted Graphs using Bounded Asynchrony",
year=2017,
booktitle="Languages and Compilers for Parallel Computing (LCPC 2016)",
series="Lecture Notes in Computer Science",
volume="10136",
publisher="Springer, Cham",
note = "DOI: https://doi.org/10.1007/978-3-319-52709-3_4"
}


Abstract

We introduce a new parallel algorithm for approximate breadth-first ordering of an unweighted graph by using bounded asynchrony to parametrically control both the performance and error of the algorithm. This work is based on the k-level asynchronous (KLA) paradigm that trades expensive global synchronizations in the level-synchronous model for local synchronizations in the asynchronous model, which may result in redundant work. Instead of correcting errors introduced by asynchrony and redoing work as in KLA, in this work we control the amount of work that is redone and thus the amount of error allowed, leading to higher performance at the expense of a loss of precision. Results of an implementation of this algorithm are presented on up to 32,768 cores, showing 2.27x improvement over the exact KLA algorithm and 3.8x improvement over the level-synchronous version with minimal error on several graph inputs.


On the theory of user-guided planning, Jory Denny, Jonathan Colbert, Qin Hongsen, Nancy M. Amato, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4794-4801, Oct 2016. DOI: 10.1109/IROS.2016.7759704
Keywords: Path Planning, Robotics, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{7759704,
author={J. {Denny} and J. {Colbert} and H. {Qin} and N. M. {Amato}},
booktitle={2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={On the theory of user-guided planning},
year={2016},
volume={},
number={},
pages={4794-4801},
doi={10.1109/IROS.2016.7759704}}


Abstract

Sampling-based techniques are often employed to solve various complex motion planning problems -the problem of computing a valid path under various robot and/or obstacle constraints. As these methods are random in nature, the probability of their success is directly related to the expansiveness, or openness, of the underlying planning space. However, little is known theoretically in qualifying the conditions under which user (human)-guided approaches improve the efficiency of sampling-based planners. In this paper, we classify and create simplistic models of common user-guided approaches, and we extend the concept of expansiveness to analyze these models to understand both when and how much user-guidance aids sampling-based planners.


On the theory of user-guided planning, Jory Denny, Jonathan Colbert, Qin Hongsen, Nancy M. Amato, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4794-4801, Oct 2016. DOI: 10.1109/IROS.2016.7759704
Keywords: Path Planning, Robotics, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{7759704,
author={J. {Denny} and J. {Colbert} and H. {Qin} and N. M. {Amato}},
booktitle={2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={On the theory of user-guided planning},
year={2016},
volume={},
number={},
pages={4794-4801},
doi={10.1109/IROS.2016.7759704}}


Abstract

Sampling-based techniques are often employed to solve various complex motion planning problems -the problem of computing a valid path under various robot and/or obstacle constraints. As these methods are random in nature, the probability of their success is directly related to the expansiveness, or openness, of the underlying planning space. However, little is known theoretically in qualifying the conditions under which user (human)-guided approaches improve the efficiency of sampling-based planners. In this paper, we classify and create simplistic models of common user-guided approaches, and we extend the concept of expansiveness to analyze these models to understand both when and how much user-guidance aids sampling-based planners.


Motion Planning using Hierarchical Aggregation of Workspace Obstacles, Mukulika Ghosh, Shawna Thomas, Marco Morales, Sam Rodriguez, and Nancy M. Amato, In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 5716-5721, Daejeon, Korea, Oct 2016. DOI: 10.1109/IROS.2016.7759841
Keywords: Motion Planning, Workspace Topology
Links : [Published]

BibTex

@inproceedings{ghosh2016motion,
title={Motion planning using hierarchical aggregation of workspace obstacles},
author={Ghosh, Mukulika and Thomas, Shawna and Morales, Marco and Rodriguez, Sam and Amato, Nancy M},
booktitle={2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages={5716--5721},
year={2016},
organization={IEEE}
}


Abstract

Sampling-based motion planning is the state-of-the-art technique for solving challenging motion planning problems in a wide variety of domains. While generally successful, their performance suffers from increasing problem complexity. In many cases, the full problem complexity is not needed for the entire solution. We present a hierarchical aggregation framework that groups and models sets of obstacles based on the currently needed level of detail. The hierarchy enables sampling to be performed using the simplest and most conservative representation of the environment possible in that region. Our results show that this scheme improves planner performance irrespective of the underlying sampling method and input problem. In many cases, improvement is significant, with running times often less than 60% of the original planning time.


Sampling Based Motion Planning with Reachable Volumes, Troy McMahon, Doctoral dissertation, Texas A & M University, College Station, Texas, USA, Aug 2016.
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@phdthesis{mcmahon2016sampling,
title={Sampling based motion planning with reachable volumes},
author={McMahon, Troy Anthony},
year={2016}
}


Abstract

Motion planning for constrained systems is a version of the motion planning problem in which the motion of a robot is limited by constraints. For example, one can require that a humanoid robot such as a PR2 remain upright by constraining its torso to be above its base or require that an object such as a bucket of water remain upright by constraining the vertices of the object to be parallel to the robot’s base. Grasping can be modeled by requiring that the end effectors of the robot be located at specified handle positions. Constraints might require that the robot remain in contact with a surface, or that certain joints of the robot remain in contact with each other (e.g., closed chains). Such problems are particularly difficult because the constraints form a manifold in C-space, and planning must be restricted to this manifold. High degree of freedom motion planning and motion planning for constrained systems has applications in parallel robotics, grasping and manipulation, computational biology and molecular simulations, and animation. In this work, we introduce a new concept, reachable volumes, that are a geometric representation of the regions the joints and end effectors of a robot can reach, and use it to define a new planning space, called RV-space, where all points automatically satisfy a problem’s constraints. Visualizations of reachable volumes can enable operators to see the regions of workspace that different parts of the robot can reach. Samples and paths generated in RV-space naturally conform to constraints, making planning for constrained systems no more difficult than planning for unconstrained systems. Consequently, constrained motion planning problems that were previously difficult or unsolvable become manageable and in many cases trivial. We provide tools and techniques to extend the state of the art sampling based motion planning algorithms to RV-space. We define a reachable volume sampler, a reachable volume local planner and a reachable volume distance metric. We showcase the effectiveness of RV-space by applying these tools to motion planning problems for robots with constraints on the end effectors and/or internal joints of the robot. We show that RV-based planners are more efficient than existing methods, particularly for higher dimensional problems, solving problems with 1000+ degrees of freedom for multi-loop, and tree-like linkages.


Improved Sampling Based Motion Planning Through Local Learning, Chinwe Pamela Ekenna, Improved Sampling Based Motion Planning Through Local Learning, Aug 2016.
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@phdthesis{ekenna2016improved,
title={Improved sampling based motion planning through local learning},
author={Ekenna, Chinwe Pamela},
year={2016}
}


Abstract

Every motion made by a moving object is either planned implicitly, e.g., human natural movement from one point to another, or explicitly, e.g., pre-planned information about where a robot should move in a room to effectively avoid colliding with obstacles. Motion planning is a well-studied concept in robotics and it involves moving an object from a start to goal configuration. Motion planning arises in many application domains such as robotics, computer animation (digital actors), intelligent CAD (virtual prototyping and training) and even computational biology (protein folding and drug design). Interestingly, a single class of planners, sampling-based planners have proven effective in all these domains. Probabilistic Roadmap Methods (PRMs) are one type of sampling-based planners that sample robot configurations (nodes) and connect them via viable local paths (edges) to form a roadmap containing representative feasible trajectories. The roadmap is then queried to find solution paths between start and goal configurations. Different PRM strategies perform differently given different input parameters, e.g., workspace environments and robot definitions. Motion planning, however, is computationally hard – it requires geometric path planning which has been shown to be PSPACE hard, complex representational issues for robots with known physical, geometric and temporal constraints, and challenging mapping/representing requirements for the workspace environment. Many important environments, e.g., houses, factories and airports, are heterogeneous, i.e., contain free, cluttered and narrow spaces. Heterogeneous environments, however, introduce a new set of problems for motion planning and PRM strategies because there is no ideal method suitable for all regions in the environment. In this work we introduce a technique that can adapt and apply PRM methods suitable for local regions in an environment. The basic strategy is to first identify a local region of the environment suitable for the current action based on identified neighbors. Next, based on past performance of methods in this region, adapt and pick a method to use at this time. This selection and adaptation is done by applying machine learning. By performing the local region creation in this dynamic fashion, we remove the need to explicitly partition the environment as was done in previous methods and which is difficult to do, slows down performance and includes the difficult process of determining what strategy to use even after making an explicit partitioning. Our method handles and removes these overheads. We show benefits of this approach in both planning robot motions and in protein folding simulations. We perform experiments on robots in simulation with different degrees of freedom and varying levels of heterogeneity in the environment and show an improvement in performance when our local learning method is applied. Protein folding simulations were performed on 23 proteins and we note an improvement in the quality of pathways produced with comparable performance in terms of time needed to build the roadmap.


Adaptive Local Learning in Sampling Based Motion Planning for Protein Folding, Chinwe Ekenna and Shawna Thomas and Nancy M. Amato, BMC Syst Biol, Special Issue from the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Vol: 10, Issue: 49, pp. 165-179, Aug 2016. DOI: 10.1186/s12918-016-0297-9
Keywords: Computational Biology, Protein Folding
Links : [Published]

BibTex

@article{ekenna-bmcsb-2016
, author = "Chinwe Ekenna and Shawna Thomas and Nancy M. Amato"
, title = "Adaptive Local Learning in Sampling Based Motion Planning for Protein Folding"
, journal = "BMC Syst Biol, Special Issue from IEEE International Conference on Bioinformatics and Biomedicine (BIBM)"
, volume = "10"
, number = "49"
, pages = "165--179"
, year = "2016"
, month = "August"
, doi = "10.1186/s12918-016-0297-9"
}


Abstract

Background Simulating protein folding motions is an important problem in computational biology. Motion planning algorithms, such as Probabilistic Roadmap Methods, have been successful in modeling the folding landscape. Probabilistic Roadmap Methods and variants contain several phases (i.e., sampling, connection, and path extraction). Most of the time is spent in the connection phase and selecting which variant to employ is a difficult task. Global machine learning has been applied to the connection phase but is inefficient in situations with varying topology, such as those typical of folding landscapes. Results We develop a local learning algorithm that exploits the past performance of methods within the neighborhood of the current connection attempts as a basis for learning. It is sensitive not only to different types of landscapes but also to differing regions in the landscape itself, removing the need to explicitly partition the landscape. We perform experiments on 23 proteins of varying secondary structure makeup with 52–114 residues. We compare the success rate when using our methods and other methods. We demonstrate a clear need for learning (i.e., only learning methods were able to validate against all available experimental data) and show that local learning is superior to global learning producing, in many cases, significantly higher quality results than the other methods. Conclusions We present an algorithm that uses local learning to select appropriate connection methods in the context of roadmap construction for protein folding. Our method removes the burden of deciding which method to use, leverages the strengths of the individual input methods, and it is extendable to include other future connection methods.


The Impact of Approximate Methods on Local Learning in Motion Planning, Diane Uwacu, Chinwe Ekenna, Shawna Thomas, Nancy Amato, 1st International Workshop on Robot Learning and Planning (RLP 2016), in conjunction with RSS 2016, pp. 38-44, Ann Arbor, MI, Jun 2016.
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [ArXiv] [Manuscript]

BibTex

@inproceedings{uwacu-wrlp-2016
, author = "Diane Uwacu and Chinwe Ekenna and Shawna Thomas and Nancy M. Amato"
, title = "The Impact of Approximate Methods on Local Learning in Motion Planning"
, booktitle = "1st International Workshop on Robot Learning and Planning (RLP 2016), in conjunction with RSS 2016"
, month = "June"
, year = "2016"
, note = "http://chitsazlab.org/robotics/rlp2016/RLP_2016_paper_11.html"
}


Abstract

Machine learning methods have been applied to many motion planning algorithms including probabilistic roadmap methods (PRM). There are many variants of these methods and choosing the best one every time is hard and depends on local properties of the environment. A successful learning approach has been developed to offset this issue. This learning approach was applied to PRMs to help decide intelligently what method to utilize in dynamically created local regions of the environment or task space. It used exact neighbor finding approaches and removed the need to partition environments to get improved results. In this work we make further advances by introducing approximate neighbor finder methods. It has been established that approximate neighbor finding methods are faster than exact methods, still work well in connecting nodes to edges in PRMs, and that connection is robust to noise. We study what happens when noise is introduced into learning by using approximate methods instead of already studied exact methods. We show that the impact of noise on learning depends on how much learning needs to take place given the topology of the environment. Our results demonstrate a correlation between heterogeneity and the need for learning over a local region.


MPMD Framework for Offloading Load Balance Computation, Olga Pearce, Todd Gamblin, Bronis R. De Supinski, Martin Schulz, Nancy M. Amato, 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 943-952, May 2016. DOI: 10.1109/IPDPS.2016.16
Keywords: Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{DBLP:conf/ipps/PearceGSSA16,
author = {Olga Pearce and
Todd Gamblin and
Bronis R. de Supinski and
Martin Schulz and
Nancy M. Amato},
title = {{MPMD} Framework for Offloading Load Balance Computation},
booktitle = {2016 {IEEE} International Parallel and Distributed Processing Symposium,
{IPDPS} 2016, Chicago, IL, USA, May 23-27, 2016},
pages = {943--952},
publisher = {{IEEE} Computer Society},
year = {2016},
url = {https://doi.org/10.1109/IPDPS.2016.16},
doi = {10.1109/IPDPS.2016.16},
timestamp = {Wed, 16 Oct 2019 14:14:51 +0200},
biburl = {https://dblp.org/rec/conf/ipps/PearceGSSA16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

In many parallel scientific simulations, work is assigned to processors by decomposing a spatial domain consisting of mesh cells, particles, or other elements. When work per element changes, simulations can use dynamic load balance algorithms to distribute work to processors evenly. Typical SPMD simulations wait while a load balance algorithm runs on all processors, but this algorithm can itself become a bottleneck. We propose a novel approach based on two key observations: (1) application state typically changes slowly in SPMD physics simulations, so work assignments computed in the past still produce good load balance in the future, (2) we can decouple the load balance algorithm so that it runs concurrently with the application and more efficiently on a smaller number of processors. We then apply the work assignment "late", once it has been computed. We call this approach lazy load balancing. In this paper, we show that the rate of change in work distribution is slow for a Barnes-Hut benchmark and for ParaDiS, a dislocation dynamics simulation. We implement an MPMD framework to exploit this property to save resources by running a load balancing algorithm at higher parallel efficiency on a smaller number of processors. Using our framework, we explore the trade-offs of lazy load balancing and demonstrate performance improvements of up to 46%.


Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory, Ioannis Papadopoulos, Nathan Thomas, Adam Fidel, Dielli Hoxha, Nancy M. Amato, Lawrence Rauchwerger, In Wkshp. on Lang. and Comp. for Par. Comp. (LCPC), pp. 106-121, Chapel Hill, NC, Feb 2016. DOI: 10.1007/978-3-319-29778-1_7
Keywords: Runtime Systems, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-319-29778-1_7,
author="Papadopoulos, Ioannis
and Thomas, Nathan
and Fidel, Adam
and Hoxha, Dielli
and Amato, Nancy M.
and Rauchwerger, Lawrence",
editor="Shen, Xipeng
and Mueller, Frank
and Tuck, James",
title="Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory",
booktitle="Languages and Compilers for Parallel Computing",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="106--121",
abstract="Nested parallelism is of increasing interest for both expressivity and performance. Many problems are naturally expressed with this divide-and-conquer software design approach. In addition, programmers with target architecture knowledge employ nested parallelism for performance, imposing a hierarchy in the application to increase locality and resource utilization, often at the cost of implementation complexity.",
isbn="978-3-319-29778-1"
}


Abstract

Nested parallelism is of increasing interest for both expressivity and performance. Many problems are naturally expressed with this divide-and-conquer software design approach. In addition, programmers with target architecture knowledge employ nested parallelism for performance, imposing a hierarchy in the application to increase locality and resource utilization, often at the cost of implementation complexity. While dynamic applications are a natural fit for the approach, support for nested parallelism in distributed systems is generally limited to well-structured applications engineered with distinct phases of intra-node computation and inter-node communication. This model makes expressing irregular applications difficult and also hurts performance by introducing unnecessary latency and synchronizations. In this paper we describe an approach to asynchronous nested parallelism which provides uniform treatment of nested computation across distributed memory. This approach allows efficient execution while supporting dynamic applications which cannot be mapped onto the machine in the rigid manner of regular applications. We use several graph algorithms as examples to demonstrate our library’s expressivity, flexibility, and performance.


Planning Motions for Shape-Memory Alloy Sheets, Mukulika Ghosh, Daniel Tomkins, Jory Denny, Sam Rodriguez, Marco Morales Aguirre, Nancy M. Amato, Origami, Vol: 6, Issue: 6, pp. 501-511, Dec 2015. DOI: 10.1090/MBK/095.2/13
Keywords: Computational Geometry, Motion Planning
Links : [Published]

BibTex

@inproceedings{Ghosh2015PlanningMF,
title={Planning motions for shape-memory alloy sheets},
author={Mukulika Ghosh and D. Tomkins and J. Denny and S. Rodr{\'i}guez and M. Morales and N. Amato},
year={2015}
}


Abstract

Shape Memory Alloys (SMAs) are smart materials that can remember predefined shapes. A deformed SMA can transition to a trained shape by applying temperature changes to portions of the material. This reconfigurable property allows SMAs to be used in aeronautics, medicine, and other fields where dynamic re-engineering or actuation of components is required. In this work, we plan the motion of an SMA robot modeled as inflexible regions connected by flexible joints. In this work, we adapt an existing state-of-the-art motion planning algorithm to model the folding of an SMA robot from an unfolded flat state to a folded shape under feasibility constraints such as collision free motion and gravitational stability. Our results validate our model and algorithm by folding interesting 3D shapes using gravitationally stable motions, show flexibility in modeling various planning problems and significantly improved motions in comparable time to not using stability constraints.


An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms (Conference Best Paper Finalist), Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 201-212, San Francisco, CA, Oct 2015. DOI: 10.1109/PACT.2015.34
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@INPROCEEDINGS{7429306,
author={ {Harshvardhan} and A. {Fidel} and N. M. {Amato} and L. {Rauchwerger}},
booktitle={2015 International Conference on Parallel Architecture and Compilation (PACT)},
title={An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms},
year={2015},
volume={},
number={},
pages={201-212},}


Abstract

Graph algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. This work presents an approach to transparently (without programmer intervention) allow fine-grained graph algorithms to utilize algorithmic communication reduction optimizations. In many graph algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithm-aware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained graph algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131,000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various graph mining and graph analytics algorithms.


Improved Roadmap Connection via Local Learning for Sampling Based Planners, Chinwe Ekenna, Diane Uwacu, Shawna Thomas, Nancy Amato, Proc. IEEE/RSJ Int. Conf. Intel. Rob. Syst. (IROS), pp. 3227-3234, Hamburg, Germany, Oct 2015. DOI: 10.1109/IROS.2015.7353825
Keywords: Machine Learning, Motion Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{7353825, author={C. {Ekenna} and D. {Uwacu} and S. {Thomas} and N. M. {Amato}}, booktitle={2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, title={Improved roadmap connection via local learning for sampling based planners}, year={2015}, volume={}, number={}, pages={3227-3234},}


Abstract

Probabilistic Roadmap Methods (PRMs) solve the motion planing problem by constructing a roadmap (or graph) that models the motion space when feasible local motions exist. PRMs and variants contain several phases during roadmap generation i.e., sampling, connection, and query. Some work has been done to apply machine learning to the connection phase to decide which variant to employ, but it uses a global learning approach that is inefficient in heterogeneous situations. We present an algorithm that instead uses local learning: it only considers the performance history in the vicinity of the current connection attempt and uses this information to select good candidates for connection. It thus removes any need to explicitly partition the environment which is burdensome and typically difficult to do. Our results show that our method learns and adapts in heterogeneous environments, including a KUKA youBot with a fixed and mobile base. It finds solution paths faster for single and multi-query scenarios and builds roadmaps with better coverage and connectivity given a fixed amount of time in a wide variety of input problems. In all cases, our method outperforms the previous adaptive connection method and is comparable or better than the best individual method.


Studying Learning Techniques in Different Phases of PRM Construction, Chinwe Ekenna, Diane Uwacu, Shawna Thomas, Nancy Amato, In Machine Learning in Planning and Control of Robot Motion Workshop (IROS-MLPC), Hamburg, Germany, Oct 2015. DOI: 10.1109/IROS.2015.7353825
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

no bib for now.


Abstract

Probabilistic Roadmap Methods (PRMs) solve the motion planning problem in two phases by sampling free configurations and connecting them together to build a map that is used to find a valid path. Existing algorithms are highly sensitive to the topology of the problem, and their efficiency depends on applying them to a compatible problem. Reinforcement learning has been applied to motion planning and rewards the action performed by planners during either sampling or connection, but not both. Previous work computed a global reward and action scheme, which saw a setback when heterogeneous environments were concerned. Local learning (connection) was recently introduced to offset this weakness identified during global learning, and there was some improvement in planner performance. These different learning schemes (global and local) have shown strengths and weaknesses individually. In this paper, we investigate local learning for sampling. We study what type of learning to apply when, and how the two phases of PRM roadmap construction interact, which has not been investigated before. We show the performance using each scheme on a KUKAyouBot, an 8 degree of freedom robot, and analyze what happens when they are all combined during roadmap construction


Decoy Database Improvement for Protein Folding, Hsin-Yi (Cindy) Yeh, Aaron Lindsey, Chih-Peng Wu, Shawna Thomas, and Nancy M. Amato, Journal of Computational Biology, Vol: 22, Issue: 9, pp. 823-836, Sep 2015. DOI: 10.1089/cmb.2015.0116
Keywords: Computational Biology, Protein Folding
Links : [Published]

BibTex

@article{yeh-jcb-2015
, author = "Hsin-Yi Yeh and Aaron Lindsey and Chih-Peng Wu and Shawna Thomas and Nancy M. Amato"
, title ="Decoy Database Improvement for Protein Folding"
, journal = "Journal of Computational Biology"
, volume = "22"
, number = "9"
, year = "2015"
, pages = "823-836"
, doi = "10.1089/cmb.2015.0116"
}


Abstract

Predicting protein structures and simulating protein folding are two of the most important problems in computational biology today. Simulation methods rely on a scoring function to distinguish the native structure (the most energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing redundant structures. We test our approach on 20 different decoy databases of varying size and type and show significant improvement across a variety of metrics. We also test our improved databases on two popular modern scoring functions and show that for most cases they contain a greater or equal number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions.


Finding schedule-sensitive branches, Jeff Huang, Lawrence Rauchwerger, Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, (ESEC/FSE), pp. 439-449, Bergamo, Italy, Aug 2015. DOI: 10.1145/2786805.2786840
Keywords: Adaptive Algorithm
Links : [Published]

BibTex

@inproceedings{DBLP:conf/sigsoft/HuangR15,
author = {Jeff Huang and
Lawrence Rauchwerger},
editor = {Elisabetta Di Nitto and
Mark Harman and
Patrick Heymans},
title = {Finding schedule-sensitive branches},
booktitle = {Proceedings of the 2015 10th Joint Meeting on Foundations of Software
Engineering, {ESEC/FSE} 2015, Bergamo, Italy, August 30 - September
4, 2015},
pages = {439--449},
publisher = {{ACM}},
year = {2015},
url = {https://doi.org/10.1145/2786805.2786840},
doi = {10.1145/2786805.2786840},
timestamp = {Tue, 06 Nov 2018 16:59:22 +0100},
biburl = {https://dblp.org/rec/conf/sigsoft/HuangR15.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

This paper presents an automated, precise technique, TAME, for identifying schedule-sensitive branches (SSBs) in concurrent programs, i.e., branches whose decision may vary depending on the actual scheduling of concurrent threads. The technique consists of 1) tracing events at fine-grained level; 2) deriving the constraints for each branch; and 3) invoking an SMT solver to find possible SSB, by trying to solve the negated branch condition. To handle the infeasibly huge number of computations that would be generated by the fine-grained tracing, TAME leverages concolic execution and implements several sound approximations to delimit the number of traces to analyse, yet without sacrificing precision. In addition, TAME implements a novel distributed trace partition approach distributing the analysis into smaller chunks. Evaluation on both popular benchmarks and real applications shows that TAME is effective in finding SSBs and has good scalability. TAME found a total of 34 SSBs, among which 17 are related to concurrency errors, and 9 are ad hoc synchronizations.


Composing Algorithmic Skeletons to Express High-Performance Scientific Applications, Mani Zandifar, Mustafa Abdujabbar, Alireza Majidi, David Keyes, Nancy M. Amato, Lawrence Rauchwerger, In Proc. ACM Int. Conf. Supercomputing (ICS), pp. 415-424, Newport Beach, CA, USA. Conference Best Paper Award., Jun 2015. DOI: 10.1145/2751205.2751241
Keywords: Algorithmic Skeletons, Data Flow, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2751205.2751241,
author = {Zandifar, Mani and Abdul Jabbar, Mustafa and Majidi, Alireza and Keyes, David and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {Composing Algorithmic Skeletons to Express High-Performance Scientific Applications},
year = {2015},
isbn = {9781450335591},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2751205.2751241},
doi = {10.1145/2751205.2751241},
abstract = {Algorithmic skeletons are high-level representations for parallel programs that hide the underlying parallelism details from program specification. These skeletons are defined in terms of higher-order functions that can be composed to build larger programs. Many skeleton frameworks support efficient implementations for stand-alone skeletons such as map, reduce, and zip for both shared-memory systems and small clusters. However, in these frameworks, expressing complex skeletons that are constructed through composition of fundamental skeletons either requires complete reimplementation or suffers from limited scalability due to required global synchronization. In the STAPL Skeleton Framework, we represent skeletons as parametric data flow graphs and describe composition of skeletons by point-to-point dependencies of their data flow graph representations. As a result, we eliminate the need for reimplementation and global synchronizations in composed skeletons. In this work, we describe the process of translating skeleton-based programs to data flow graphs and define rules for skeleton composition. To show the expressivity and ease of use of our framework, we show skeleton-based representations of the NAS EP, IS, and FT benchmarks. To show reusability and applicability of our framework on real-world applications we show an N-Body application using the FMM (Fast Multipole Method) hierarchical algorithm. Our results show that expressivity can be achieved without loss of performance even in complex real-world applications.},
booktitle = {Proceedings of the 29th ACM on International Conference on Supercomputing},
pages = {415–424},
numpages = {10},
keywords = {high-performance computing, data flow programming, distributed systems, algorithmic skeletons, patterns},
location = {Newport Beach, California, USA},
series = {ICS '15}
}


Abstract

Algorithmic skeletons are high-level representations for parallel programs that hide the underlying parallelism details from program specification. These skeletons are defined in terms of higher-order functions that can be composed to build larger programs. Many skeleton frameworks support efficient implementations for stand-alone skeletons such as map, reduce, and zip for both shared-memory systems and small clusters. However, in these frameworks, expressing complex skeletons that are constructed through composition of fundamental skeletons either requires complete reimplementation or suffers from limited scalability due to required global synchronization. In the STAPL Skeleton Framework, we represent skeletons as parametric data flow graphs and describe composition of skeletons by point-to-point dependencies of their data flow graph representations. As a result, we eliminate the need for reimplementation and global synchronizations in composed skeletons. In this work, we describe the process of translating skeleton-based programs to data flow graphs and define rules for skeleton composition. To show the expressivity and ease of use of our framework, we show skeleton-based representations of the NAS EP, IS, and FT benchmarks. To show reusability and applicability of our framework on real-world applications we show an N-Body application using the FMM (Fast Multipole Method) hierarchical algorithm. Our results show that expressivity can be achieved without loss of performance even in complex real-world applications.


STAPL-RTS: An Application Driven Runtime System, Ioannis Papadopoulos, Nathan Thomas, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. ACM Int. Conf. Supercomputing (ICS), pp. 425-434, Newport Beach, CA, USA, Jun 2015. DOI: 10.1145/2751205.2751233
Keywords: Communication Library, Runtime Systems, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2751205.2751233,
author = {Papadopoulos, Ioannis and Thomas, Nathan and Fidel, Adam and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {STAPL-RTS: An Application Driven Runtime System},
year = {2015},
isbn = {9781450335591},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2751205.2751233},
doi = {10.1145/2751205.2751233},
abstract = {Modern HPC systems are growing in complexity, as they move towards deeper memory hierarchies and increasing use of computational heterogeneity via GPUs or other accelerators. When developing applications for these platforms, programmers are faced with two bad choices. On one hand, they can explicitly manage all machine resources, writing programs decorated with low level primitives from multiple APIs (e.g. Hybrid MPI / OpenMP applications). Though seemingly necessary for efficient execution, it is an inherently non-scalable way to write software. Without a separation of concerns, only small programs written by expert developers actually achieve this efficiency. Furthermore, the implementations are rigid, difficult to extend, and not portable. Alternatively, users can adopt higher level programming environments to abstract away these concerns. Extensibility and portability, however, often come at the cost of lost performance. The mapping of a user's application onto the system now occurs without the contextual information that was immediately available in the more coupled approach.In this paper, we describe a framework for the transfer of high level, application semantic knowledge into lower levels of the software stack at an appropriate level of abstraction. Using the STAPL library, we demonstrate how this information guides important decisions in the runtime system (STAPL-RTS), such as multi-protocol communication coordination and request aggregation. Through examples, we show how generic programming idioms already known to C++ programmers are used to annotate calls and increase performance.},
booktitle = {Proceedings of the 29th ACM on International Conference on Supercomputing},
pages = {425–434},
numpages = {10},
keywords = {data flow, shared memory, remote method invocation, distributed memory, parallel programming, runtime systems, application driven optimizations},
location = {Newport Beach, California, USA},
series = {ICS '15}
}


Abstract

Modern HPC systems are growing in complexity, as they move towards deeper memory hierarchies and increasing use of computational heterogeneity via GPUs or other accelerators. When developing applications for these platforms, programmers are faced with two bad choices. On one hand, they can explicitly manage all machine resources, writing programs decorated with low level primitives from multiple APIs (e.g. Hybrid MPI / OpenMP applications). Though seemingly necessary for efficient execution, it is an inherently non-scalable way to write software. Without a separation of concerns, only small programs written by expert developers actually achieve this efficiency. Furthermore, the implementations are rigid, difficult to extend, and not portable. Alternatively, users can adopt higher level programming environments to abstract away these concerns. Extensibility and portability, however, often come at the cost of lost performance. The mapping of a user's application onto the system now occurs without the contextual information that was immediately available in the more coupled approach. In this paper, we describe a framework for the transfer of high level, application semantic knowledge into lower levels of the software stack at an appropriate level of abstraction. Using the STAPL library, we demonstrate how this information guides important decisions in the runtime system (STAPL-RTS), such as multi-protocol communication coordination and request aggregation. Through examples, we show how generic programming idioms already known to C++ programmers are used to annotate calls and increase performance.


A Hybrid Approach To Processing Big Data Graphs on Memory-Restricted Systems, Harshvardhan, Brandon West, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), pp. 799-808, Hyderabad, India, May 2015. DOI: 10.1109/IPDPS.2015.28
Keywords: Big Data, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@INPROCEEDINGS{7161566, author={ {Harshvardhan} and B. {West} and A. {Fidel} and N. M. {Amato} and L. {Rauchwerger}}, booktitle={2015 IEEE International Parallel and Distributed Processing Symposium}, title={A Hybrid Approach to Processing Big Data Graphs on Memory-Restricted Systems}, year={2015}, volume={}, number={}, pages={799-808},}


Abstract

With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques that can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into sub graphs that fit in RAM and uses a paging-like technique to load sub graphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.


Reachable Volume RRT, Troy McMahon, Shawna L. Thomas, Nancy M. Amato, 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2977-2984, Seattle, Washington, USA, May 2015. DOI: 10.1109/ICRA.2015.7139607
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{7139607, author={T. {McMahon} and S. {Thomas} and N. M. {Amato}}, booktitle={2015 IEEE International Conference on Robotics and Automation (ICRA)}, title={Reachable volume RRT}, year={2015}, volume={}, number={}, pages={2977-2984}, doi={10.1109/ICRA.2015.7139607}}


Abstract

Reachable volumes are a new technique that allows one to efficiently restrict sampling to feasible/reachable regions of the planning space even for high degree of freedom and highly constrained problems. However, they have so far only been applied to graph-based sampling-based planners. In this paper we develop the methodology to apply reachable volumes to tree-based planners such as Rapidly-Exploring Random Trees (RRTs). In particular, we propose a reachable volume RRT called RVRRT that can solve high degree of freedom problems and problems with constraints. To do so, we develop a reachable volume stepping function, a reachable volume expand function, and a distance metric based on these operations. We also present a reachable volume local planner to ensure that local paths satisfy constraints for methods such as PRMs. We show experimentally that RVRRTs can solve constrained problems with as many as 64 degrees of freedom and unconstrained problems with as many as 134 degrees of freedom. RVRRTs can solve problems more efficiently than existing methods, requiring fewer nodes and collision detection calls. We also show that it is capable of solving difficult problems that existing methods cannot.


A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms, Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger, In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 285-286, San Francisco, CA, USA, Feb 2015. DOI: 10.1145/2688500.2700994
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2688500.2700994,
author = {Harshvardhan and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms},
year = {2015},
isbn = {9781450332057},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2688500.2700994},
doi = {10.1145/2688500.2700994},
abstract = { Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5\texttimes{}-8\texttimes{} improvement for a variety of graph algorithms at 12,000+ cores. },
booktitle = {Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {285–286},
numpages = {2},
keywords = {Distributed Computing, Graph Analytics, Parallel Graph Processing, Big Data},
location = {San Francisco, CA, USA},
series = {PPoPP 2015}
}

@article{10.1145/2858788.2700994,
author = {Harshvardhan and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms},
year = {2015},
issue_date = {August 2015},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {50},
number = {8},
issn = {0362-1340},
url = {https://doi.org/10.1145/2858788.2700994},
doi = {10.1145/2858788.2700994},
abstract = { Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5\texttimes{}-8\texttimes{} improvement for a variety of graph algorithms at 12,000+ cores. },
journal = {SIGPLAN Not.},
month = jan,
pages = {285–286},
numpages = {2},
keywords = {Distributed Computing, Parallel Graph Processing, Big Data, Graph Analytics}
}


Abstract

Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5×-8× improvement for a variety of graph algorithms at 12,000+ cores.


Scalable conditional induction variables (CIV) analysis, Cosmin E. Oancea, Lawrence Rauchwerger, Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 213-224, San Francisco, California, USA, Feb 2015. DOI: 10.1109/CGO.2015.7054201
Keywords: Parallel Programming
Links : [Published]

BibTex

@inproceedings{DBLP:conf/cgo/OanceaR15,
author = {Cosmin E. Oancea and
Lawrence Rauchwerger},
editor = {Kunle Olukotun and
Aaron Smith and
Robert Hundt and
Jason Mars},
title = {Scalable conditional induction variables {(CIV)} analysis},
booktitle = {Proceedings of the 13th Annual {IEEE/ACM} International Symposium
on Code Generation and Optimization, {CGO} 2015, San Francisco, CA,
USA, February 07 - 11, 2015},
pages = {213--224},
publisher = {{IEEE} Computer Society},
year = {2015},
url = {https://doi.org/10.1109/CGO.2015.7054201},
doi = {10.1109/CGO.2015.7054201},
timestamp = {Wed, 16 Oct 2019 14:14:57 +0200},
biburl = {https://dblp.org/rec/conf/cgo/OanceaR15.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

Subscripts using induction variables that cannot be expressed as a formula in terms of the enclosing-loop indices appear in the low-level implementation of common programming abstractions such as Alter, or stack operations and pose significant challenges to automatic parallelization. Because the complexity of such induction variables is often due to their conditional evaluation across the iteration space of loops we name them Conditional Induction Variables (CIV). This paper presents a flow-sensitive technique that summarizes both such CIV-based and affine subscripts to program level, using the same representation. Our technique requires no modifications of our dependence tests, which is agnostic to the original shape of the subscripts, and is more powerful than previously reported dependence tests that rely on the pairwise disambiguation of read-write references. We have implemented the CIV analysis in our parallelizing compiler and evaluated its impact on five Fortran benchmarks. We have found that that there are many important loops using CIV subscripts and that our analysis can lead to their scalable parallelization. This in turn has led to the parallelization of the benchmark programs they appear in.


Efficient, Reachability-based, Parallel Algorithms for Finding Strongly Connected Components, Daniel Tomkins, Timmie Smith, Nancy M. Amato, Lawrence Rauchwerger, Technical Report, TR15-002, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, Jan 2015.
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

N/A


Abstract

Large, complex graphs are increasingly used to represent unstructured data in scientific applications. Applications such as discrete ordinates methods for radiation transport require these graphs to be directed with all cycles eliminated, where cycles are identified by finding the graph.s strongly connected components (SCCs). Deterministic parallel algorithms identify SCCs in polylogarithmic time, but the work of the algorithms on sparse graphs exceeds the linear sequential complexity bound. Randomized algorithms based on reachability — the ability to get from one vertex to another along a directed path — are generally favorable for sparse graphs, but either do not perform well on all graphs or introduce substantial overheads. This paper introduces two new algorithms, SCCMulti and SCCMulti-PriorityAware, which perform well on all graphs without significant overhead. We also provide a characterization that compares reachabilitybased SCC algorithms. Experimental results demonstrate scalability for both algorithms to 393,216 of cores on several types of graphs.


Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates, Roger Pearce, Maya Gokhale, Nancy M. Amato, International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 549-559, New Orleans, Louisiana, USA, Nov 2014. DOI: 10.1109/SC.2014.50
Keywords: Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{DBLP:conf/sc/PearceGA14,
author = {Roger A. Pearce and
Maya B. Gokhale and
Nancy M. Amato},
editor = {Trish Damkroger and
Jack J. Dongarra},
title = {Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with
Vertex Delegates},
booktitle = {International Conference for High Performance Computing, Networking,
Storage and Analysis, {SC} 2014, New Orleans, LA, USA, November 16-21,
2014},
pages = {549--559},
publisher = {{IEEE} Computer Society},
year = {2014},
url = {https://doi.org/10.1109/SC.2014.50},
doi = {10.1109/SC.2014.50},
timestamp = {Mon, 20 Apr 2020 17:14:43 +0200},
biburl = {https://dblp.org/rec/conf/sc/PearceGA14.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%.


Validation of Full-Domain Massively Parallel Transport Sweep Algorithms, eresa S. Bailey, W. Daryl Hawkins, Marvin L. Adams, Peter N. Brown, Adam J. Kunen, Michael P. Adams, Timmie Smith, Nancy M. Amato, and Lawrence Rauchwerger, Transactions of the American Nuclear Society, Vol: 111, Issue: 1, pp. 699-702, Anaheim, CA, USA, Nov 2014.
Keywords: Parallel Algorithms, Scientific Computing, STAPL
Links : [Published]

BibTex

@article{bailey--tans-2014
, author = "Teresa S. Bailey and W. Daryl Hawkins and Marvin L. Adams and Peter N. Brown and Adam J. Kunen and Michael P. Adams and Timmie Smith and Nancy M. Amato and Lawrence Rauchwerger"
, title = "Validation of Full-Domain Massively Parallel Transport Sweep Algorithms"
, journal = "Trans. American Nuclear Society"
, volume = "111"
, month = "November"
, year = "2014"
, pages = "699-702"
, note = "2014 ANS Annual Winter Meeting; Nov 9-13, 2014, Anaheim, CA, USA"
}


Abstract

The scalability of sweep algorithms has been a somewhat disputed topic recently. Some papers have predicted that these algorithms will scale if applied appropriately, while others have claimed that sweep algorithms have limited scalability. We compare the parallel performance models for sweep algorithms to actual computational performance using two distinctly different discrete ordinates transport codes. We have run both codes beyond 1Million MPI ranks on Lawrence Livermore National Laboratory’s (LLNL) BGQ machines (SEQUOIA and VULCAN).


The STAPL Skeleton Framework, Mani Zandifar, Nathan Thomas, Nancy M. Amato, Lawrence Rauchwerger, In Wkshp. on Lang. and Comp. for Par. Comp. (LCPC), pp. 176-190, Hillsboro, OR, USA, Sep 2014. DOI: 10.1007/978-3-319-17473-0_12
Keywords: Data Flow, Parallel Libraries, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-319-17473-0_12,
author="Zandifar, Mani
and Thomas, Nathan
and Amato, Nancy M.
and Rauchwerger, Lawrence",
editor="Brodman, James
and Tu, Peng",
title="The stapl Skeleton Framework",
booktitle="Languages and Compilers for Parallel Computing",
year="2015",
publisher="Springer International Publishing",
address="Cham",
pages="176--190",
abstract="This paper describes the stapl Skeleton Framework, a high-level skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and enables them to express parallel programs as a composition of existing elementary skeletons such as map, map-reduce, scan, zip, butterfly, allreduce, alltoall and user-defined custom skeletons.",
isbn="978-3-319-17473-0"
}


Abstract

This paper describes the stapl Skeleton Framework, a high-level skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and enables them to express parallel programs as a composition of existing elementary skeletons such as map, map-reduce, scan, zip, butterfly, allreduce, alltoall and user-defined custom skeletons. Skeletons in this framework are defined as parametric data flow graphs, and their compositions are defined in terms of data flow graph compositions. Defining the composition in this manner allows dependencies between skeletons to be defined in terms of point-to-point dependencies, avoiding unnecessary global synchronizations. To show the ease of composability and expressivity, we implemented the NAS Integer Sort (IS) and Embarrassingly Parallel (EP) benchmarks using skeletons and demonstrate comparable performance to the hand-optimized reference implementations. To demonstrate scalable performance, we show a transformation which enables applications written in terms of skeletons to run on more than 100,000 cores.


The anatomy of a distributed motion planning roadmap, Sam Ade Jacobs, Nancy M. Amato, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3019-3026, Chicago, Illinois, USA, Sep 2014. DOI: 10.1109/IROS.2014.6942979
Keywords: Motion Planning, Parallel Processing
Links : [Manuscript]

BibTex

@INPROCEEDINGS{6942979, author={S. A. {Jacobs} and N. M. {Amato}}, booktitle={2014 IEEE/RSJ International Conference on Intelligent Robots and Systems}, title={The anatomy of a distributed motion planning roadmap}, year={2014}, volume={}, number={}, pages={3019-3026}, doi={10.1109/IROS.2014.6942979}}


Abstract

In this paper, we evaluate and compare the quality and structure of roadmaps constructed from parallelizing sampling-based motion planning algorithms against that of roadmaps constructed using sequential planner. Also, we make an argument and provide experimental results that show that motion planning problems involving heterogenous environments (common in most realistic and large-scale motion planning) is a natural fit for spatial subdivision-based parallel processing. Spatial subdivision-based parallel processing approach is suited for heterogeneous environments because it allows for local adaption in solving a global problem while taking advantage of scalability that is possible with parallel processing.


Sampling Based Motion Planning with Reachable Volumes: Application to Manipulators and Closed Chain Systems, Troy McMahon, Shawna L. Thomas, Nancy M. Amato, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3705-3712, Chicago, Illinois, USA, Sep 2014. DOI: 10.1109/IROS.2014.6943082
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6943082, author={T. {McMahon} and S. {Thomas} and N. M. {Amato}}, booktitle={2014 IEEE/RSJ International Conference on Intelligent Robots and Systems}, title={Sampling based motion planning with reachable volumes: Application to manipulators and closed chain systems}, year={2014}, volume={}, number={}, pages={3705-3712}, doi={10.1109/IROS.2014.6943082}}


Abstract

Reachable volumes are a geometric representation of the regions the joints of a robot can reach. They can be used to generate constraint satisfying samples for problems including complicated linkage robots (e.g. closed chains and graspers). They can also be used to assist robot operators and to help in robot design.We show that reachable volumes have an O(1) complexity in unconstrained problems as well as in many constrained problems. We also show that reachable volumes can be computed in linear time and that reachable volume samples can be generated in linear time in problems without constraints. We experimentally validate reachable volume sampling, both with and without constraints on end effectors and/or internal joints. We show that reachable volume samples are less likely to be invalid due to self-collisions, making reachable volume sampling significantly more efficient for higher dimensional problems. We also show that these samples are easier to connect than others, resulting in better connected roadmaps. We demonstrate that our method can be applied to 262-dof, multi-loop, and tree-like linkages including combinations of planar, prismatic and spherical joints. In contrast, existing methods either cannot be used for these problems or do not produce good quality solutions.


KLA: A New Algorithmic Paradigm for Parallel Graph Computations (Conference Best Paper), Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 27-38, Edmonton, AB, Canada, Aug 2014. DOI: 10.1145/2628071.2628091
Keywords: Parallel Algorithms, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2628071.2628091,
author = {Harshvardhan and Fidel, Adam and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {KLA: A New Algorithmic Paradigm for Parallel Graph Computations},
year = {2014},
isbn = {9781450328098},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2628071.2628091},
doi = {10.1145/2628071.2628091},
abstract = {This paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. The KLA paradigm enables the level of asynchrony in parallel graph algorithms to be parametrically varied from none (level-synchronous) to full (asynchronous). The motivation is to improve execution times through an appropriate trade-off between the use of fewer, but more expensive global synchronizations, as in level-synchronous algorithms, and more, but less expensive local synchronizations (and perhaps also redundant work), as in asynchronous algorithms. We show how common patterns in graph algorithms can be expressed in the KLA pardigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the STAPL Graph Library show excellent scalability on up to 96K cores and improvements of 10x or more over level-synchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs.},
booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation},
pages = {27–38},
numpages = {12},
keywords = {parallel algorithms, big data, distributed computing, graph analytics, asynchronous graph algorithms},
location = {Edmonton, AB, Canada},
series = {PACT '14}
}


Abstract

This paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. The KLA paradigm enables the level of asynchrony in parallel graph algorithms to be parametrically varied from none (level-synchronous) to full (asynchronous). The motivation is to improve execution times through an appropriate trade-off between the use of fewer, but more expensive global synchronizations, as in level-synchronous algorithms, and more, but less expensive local synchronizations (and perhaps also redundant work), as in asynchronous algorithms. We show how common patterns in graph algorithms can be expressed in the KLA pardigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the STAPL Graph Library show excellent scalability on up to 96K cores and improvements of 10× or more over level-synchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs.


From Petascale to the Pocket: Adaptively Scaling Parallel Programs for Mobile SoCs, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), SRC Poster, pp. 511-512, Edmonton, AB, Canada, Aug 2014. DOI: 10.1145/2628071.2671426
Keywords: Parallel Programming, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2628071.2671426,
author = {Fidel, Adam and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {From Petascale to the Pocket: Adaptively Scaling Parallel Programs for Mobile SoCs},
year = {2014},
isbn = {9781450328098},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2628071.2671426},
doi = {10.1145/2628071.2671426},
booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation},
pages = {511–512},
numpages = {2},
keywords = {energy-efficient computing, parallel libraries, shared memory optimization},
location = {Edmonton, AB, Canada},
series = {PACT '14}
}


Abstract

With resource-constrained mobile and embedded devices being outfitted with multicore processors, there exists a need to allow existing parallel programs to be scaled down to efficiently utilize these devices. We study the marriage of programming models originally designed for distributed-memory supercomputers with smaller scale parallel architectures that are shared-memory and generally resource-constrained.


Processing Big Data Graphs on Memory-Restricted Systems, Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 517-518, Edmonton, AB, Canada, Aug 2014. DOI: 10.1145/2628071.2671429
Keywords: Big Data, Out-Of-Core Graph Algorithms, Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{10.1145/2628071.2671429,
author = {Harshvardhan and Amato, Nancy M. and Rauchweger, Lawrence},
title = {Processing Big Data Graphs on Memory-Restricted Systems},
year = {2014},
isbn = {9781450328098},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2628071.2671429},
doi = {10.1145/2628071.2671429},
abstract = {With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques, which can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance.In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into subgraphs that fit in RAM and uses a paging-like technique to load subgraphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16,000+ cores.},
booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation},
pages = {517–518},
numpages = {2},
keywords = {out-of-core graph algorithms, distributed computing, graph analytics, big data, parallel graph processing},
location = {Edmonton, AB, Canada},
series = {PACT '14}
}


Abstract

With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques, which can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. Contribution. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into subgraphs that fit in RAM and uses a paging-like technique to load subgraphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.


A Region-Based Strategy for Collaborative Roadmap Construction, Jory Denny, Read Sandstrom, Nicole Julian, Nancy M. Amato, Algorithmic Foundations of Robotics XI: Selected Contributions of the Eleventh International Workshop on the Algorithmic Foundations of Robotics, pp. 125-141, Istanbul, Turkey, Aug 2014. DOI: 10.1007/978-3-319-16595-0_8
Keywords: Collaborative Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@Inbook{Denny2015,
author="Denny, Jory
and Sandstr{\"o}m, Read
and Julian, Nicole
and Amato, Nancy M.",
editor="Akin, H. Levent
and Amato, Nancy M.
and Isler, Volkan
and van der Stappen, A. Frank",
title="A Region-Based Strategy for Collaborative Roadmap Construction",
bookTitle="Algorithmic Foundations of Robotics XI: Selected Contributions of the Eleventh International Workshop on the Algorithmic Foundations of Robotics",
year="2015",
publisher="Springer International Publishing",
address="Cham",
pages="125--141",
abstract="Motion planningMotion planninghasDenny, JoryseenJulian, NicolemuchAmato, NancyattentionSandstrom, Readover the past two decades. A great deal of progress has been made in sampling-based planningSampling-based planning, whereby a planner builds an approximate representation of the planning space. While these planners have demonstrated success in many scenarios, there are still difficult problems where they lack robustness or efficiency, e.g., certain types of narrow spaces. Conversely, human intuition can often determine an approximate solution to these problems quite effectively, but humans lack the speed and precision necessary to perform the corresponding low-level tasks (such as collision checking) in a timely manner. In this work, we introduce a novel strategy called Region Steering in which the user and a PRM planner work cooperatively to map the space while maintaining the probabilistic completeness property of the PRM planner. Region Steering utilizes two-way communication to integrate the strengths of both the user and the planner, thereby overcoming the weaknesses inherent to relying on either one alone. In one communication direction, a user can input regions, or bounding volumes in the workspace, to bias sampling towards or away from these areas. In the other direction, the planner displays its progress to the user and colors the regions based on their perceived usefulness. We demonstrate that Region Steering provides roadmap customizability, reduced mapping time, and smaller roadmap sizes compared with fully automated PRMs, e.g., Gaussian PRM.",
isbn="978-3-319-16595-0",
doi="10.1007/978-3-319-16595-0_8",
url="https://doi.org/10.1007/978-3-319-16595-0_8"
}


Abstract

Motion planning has seen much attention over the past two decades. A great deal of progress has been made in sampling-based planning, whereby a planner builds an approximate representation of the planning space. While these planners have demonstrated success in many scenarios, there are still difficult problems where they lack robustness or efficiency, e.g., certain types of narrow spaces. Conversely, human intuition can often determine an approximate solution to these problems quite effectively, but humans lack the speed and precision necessary to perform the corresponding low-level tasks (such as collision checking) in a timely manner. In this work, we introduce a novel strategy called Region Steering in which the user and a PRM planner work cooperatively to map the space while maintaining the probabilistic completeness property of the PRM planner. Region Steering utilizes two-way communication to integrate the strengths of both the user and the planner, thereby overcoming the weaknesses inherent to relying on either one alone. In one communication direction, a user can input regions, or bounding volumes in the workspace, to bias sampling towards or away from these areas. In the other direction, the planner displays its progress to the user and colors the regions based on their perceived usefulness. We demonstrate that Region Steering provides roadmap customizability, reduced mapping time, and smaller roadmap sizes compared with fully automated PRMs, e.g., Gaussian PRM.


Load balancing n-body simulations with highly non-uniform density, Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Tom Arsenlis, Nancy M. Amato, International Conference on Supercomputing (ICS), pp. 113-122, Muenchen, Germany, Jun 2014. DOI: 10.1145/2597652.2597659
Keywords: Parallel Programming
Links : [Published]

BibTex

@inproceedings{DBLP:conf/ics/PearceGSAA14,
author = {Olga Pearce and
Todd Gamblin and
Bronis R. de Supinski and
Tom Arsenlis and
Nancy M. Amato},
editor = {Arndt Bode and
Michael Gerndt and
Per Stenstr{\"{o}}m and
Lawrence Rauchwerger and
Barton P. Miller and
Martin Schulz},
title = {Load balancing n-body simulations with highly non-uniform density},
booktitle = {2014 International Conference on Supercomputing, ICS'14, Muenchen,
Germany, June 10-13, 2014},
pages = {113--122},
publisher = {{ACM}},
year = {2014},
url = {https://doi.org/10.1145/2597652.2597659},
doi = {10.1145/2597652.2597659},
timestamp = {Tue, 06 Nov 2018 11:07:02 +0100},
biburl = {https://dblp.org/rec/conf/ics/PearceGSAA14.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

N-body methods simulate the evolution of systems of particles (or bodies). They are critical for scientific research in fields as diverse as molecular dynamics, astrophysics, and material science. Most load balancing techniques for N-body methods use particle count to approximate computational work. This approximation is inaccurate, especially for systems with high density variation, because work in an N-body simulation is proportional to the particle density, not the particle count. In this paper, we demonstrate that existing techniques do not perform well at scale when particle density is highly non-uniform, and we propose a load balance technique that efficiently assigns load in terms of interactions instead of particles. We use adaptive sampling to create an even work distribution more amenable to partitioning, and to reduce partitioning overhead. We implement and evaluate our approach on a Barnes-Hut algorithm and a large-scale dislocation dynamics application, ParaDiS. Our method achieves up to 26% improvement in overall performance of Barnes-Hut and 18% in ParaDiS.


Sampling-Based Motion Planning with Reachable Volumes: Theoretical Foundations, Troy McMahon, Shawna Thomas, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, Jun 2014. DOI: 10.1109/ICRA.2014.6907820
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6907820, author={T. {McMahon} and S. {Thomas} and N. M. {Amato}}, booktitle={2014 IEEE International Conference on Robotics and Automation (ICRA)}, title={Sampling-based motion planning with reachable volumes: Theoretical foundations}, year={2014}, volume={}, number={}, pages={6514-6521}, doi={10.1109/ICRA.2014.6907820}}


Abstract

We introduce a new concept, reachable volumes, that denotes the set of points that the end effector of a chain or linkage can reach. We show that the reachable volume of a chain is equivalent to the Minkowski sum of the reachable volumes of its links, and give an efficient method for computing reachable volumes. We present a method for generating configurations using reachable volumes that is applicable to various types of robots including open and closed chain robots, tree-like robots, and complex robots including both loops and branches. We also describe how to apply constraints (both on end effectors and internal joints) using reachable volumes. Unlike previous methods, reachable volumes work for spherical and prismatic joints as well as planar joints. Visualizations of reachable volumes can allow an operator to see what positions the robot can reach and can guide robot design. We present visualizations of reachable volumes for representative robots including closed chains and graspers as well as for examples with joint and end effector constraints.


MARRT: Medial Axis Biased Rapidly-Exploring Random Trees, Jory Denny, Evan Greco, Shawna L. Thomas, Nancy M. Amato, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 90-97, Hong Kong, China, Jun 2014. DOI: 10.1109/ICRA.2014.6906594
Keywords: Motion Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6906594, author={J. {Denny} and E. {Greco} and S. {Thomas} and N. M. {Amato}}, booktitle={2014 IEEE International Conference on Robotics and Automation (ICRA)}, title={MARRT: Medial Axis biased rapidly-exploring random trees}, year={2014}, volume={}, number={}, pages={90-97}, doi={10.1109/ICRA.2014.6906594}}


Abstract

Motion planning is a difficult and widely studied problem in robotics. Current research aims not only to find feasible paths, but to ensure paths have certain properties, e.g., shortest or safest paths. This is difficult for current state-of-the-art sampling-based techniques as they typically focus on simply finding any path. Despite this difficulty, sampling-based techniques have shown great success in planning for a wide range of applications. Among such planners, Rapidly-Exploring Random Trees (RRTs) search the planning space by biasing exploration toward unexplored regions. This paper introduces a novel RRT variant, Medial Axis RRT (MARRT), which biases tree exploration to the medial axis of free space by pushing all configurations from expansion steps towards the medial axis. We prove that this biasing increases the tree's clearance from obstacles. Improving obstacle clearance is useful where path safety is important, e.g., path planning for robots performing tasks in close proximity to the elderly. Finally, we experimentally analyze MARRT, emphasizing its ability to effectively map difficult passages while increasing obstacle clearance, and compare it to contemporary RRT techniques.


Spark PRM: Using RRTs within PRMs to efficiently explore narrow passages, Kensen Shi, Jory Denny, Nancy M. Amato, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4659-4666, Hong Kong, China, Jun 2014. DOI: 10.1109/ICRA.2014.6907540
Keywords: Motion Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{shi-icra-2014
, author = "Kensen Shi and Jory Denny and Nancy M. Amato"
, title = "Spark PRM: Using RRTs within PRMs to efficiently explore narrow passages"
, booktitle = "IEEE International Conference on Robotics and Automation (ICRA)"
, location = "Hong Kong"
, month = "May"
, year = "2014"
, pages = "4659-4666"
, doi =10.1109/ICRA.2014.6907540"
}


Abstract

Probabilistic RoadMaps (PRMs) have been successful for many high-dimensional motion planning problems. However, they encounter difficulties when mapping narrow passages. While many PRM sampling methods have been proposed to increase the proportion of samples within narrow passages, such difficult planning areas still pose many challenges. We introduce a novel algorithm, Spark PRM, that sparks the growth of Rapidly-expanding Random Trees (RRTs) from narrow passage samples generated by a PRM. The RRT rapidly generates further narrow passage samples, ideally until the passage is fully mapped. After reaching a terminating condition, the tree stops growing and is added to the roadmap. Spark PRM is a general method that can be applied to all PRM variants. We study the benefits of Spark PRM with a variety of sampling strategies in a wide array of environments. We show significant speedups in computation time over RRT, Sampling-based Roadmap of Trees (SRT), and various PRM variants.


UMAPRM: Uniformly Sampling the Medial Axis, Hsin-Yi (Cindy) Yeh, Jory Denny, Aaron Lindsey, Shawna L. Thomas, Nancy M. Amato, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 5798 -5803, Hong Kong, China, Jun 2014. DOI: 10.1109/ICRA.2014.6907711
Keywords: Motion Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6907711, author={H. C. {Yeh} and J. {Denny} and A. {Lindsey} and S. {Thomas} and N. M. {Amato}}, booktitle={2014 IEEE International Conference on Robotics and Automation (ICRA)}, title={UMAPRM: Uniformly sampling the medial axis}, year={2014}, volume={}, number={}, pages={5798-5803}, doi={10.1109/ICRA.2014.6907711}}


Abstract

Maintaining clearance, or distance from obstacles, is a vital component of successful motion planning algorithms. Maintaining high clearance often creates safer paths for robots. Contemporary sampling-based planning algorithms that utilize the medial axis, or the set of all points equidistant to two or more obstacles, produce higher clearance paths. However, they are biased heavily toward certain portions of the medial axis, sometimes ignoring parts critical to planning, e.g., specific types of narrow passages. We introduce Uniform Medial Axis Probabilistic RoadMap (UMAPRM), a novel planning variant that generates samples uniformly on the medial axis of the free portion of C space . We theoretically analyze the distribution generated by UMAPRM and show its uniformity. Our results show that UMAPRM's distribution of samples along the medial axis is not only uniform but also preferable to other medial axis samplers in certain planning problems. We demonstrate that UMAPRM has negligible computational overhead over other sampling techniques and can solve problems the others could not, e.g., a bug trap. Finally, we demonstrate UMAPRM successfully generates higher clearance paths in the examples.


Reciprocally-Rotating Velocity Obstacles, Andrew Giese, Daniel Latypov, Nancy M. Amato, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3234-3241, Hong Kong, China, Jun 2014. DOI: 10.1109/ICRA.2014.6907324
Keywords: Motion Planning, Multi-Agent
Links : [Published]

BibTex

@INPROCEEDINGS{6907324, author={A. {Giese} and D. {Latypov} and N. M. {Amato}}, booktitle={2014 IEEE International Conference on Robotics and Automation (ICRA)}, title={Reciprocally-Rotating Velocity Obstacles}, year={2014}, volume={}, number={}, pages={3234-3241}, doi={10.1109/ICRA.2014.6907324}}


Abstract

Modern multi-agent systems frequently use high-level planners to extract basic paths for agents and then rely on local collision avoidance to ensure that the agents reach their destinations without colliding with one another or dynamic obstacles. One state-of-the-art local collision avoidance technique is Optimal Reciprocal Collision Avoidance (ORCA). Despite being fast and efficient for circular-shaped agents, ORCA may deadlock when polygonal shapes are used. To address this shortcoming, we introduce Reciprocally-Rotating Velocity Obstacles (RRVO). RRVO generalizes ORCA by introducing a notion of rotation for polygonally-shaped agents. This generalization permits more realistic motion than ORCA and does not suffer from as much deadlock. In this paper, we present the theory of RRVO and show empirically that it does not suffer from the deadlock issue ORCA has, permits agents to reach goals faster, and has a comparable collision rate at the cost of performance overhead quadratic in the (typically small) user-defined parameter δ.


Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Nancy M. Amato, Lawrence Rauchwerger, In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), pp. 573-582, Phoenix, AZ, USA, May 2014. DOI: 10.1109/IPDPS.2014.66
Keywords: Parallel Algorithms, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6877290, author={A. {Fidel} and S. A. {Jacobs} and S. {Sharma} and N. M. {Amato} and L. {Rauchwerger}}, booktitle={2014 IEEE 28th International Parallel and Distributed Processing Symposium}, title={Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms}, year={2014}, volume={}, number={}, pages={573-582},}


Abstract

Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores.


Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Nancy M. Amato, Lawrence Rauchwerger, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 573-582, Phoenix, AZ, USA, May 2014. DOI: 10.1109/IPDPS.2014.66
Keywords: Parallel Algorithms, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6877290, author={A. {Fidel} and S. A. {Jacobs} and S. {Sharma} and N. M. {Amato} and L. {Rauchwerger}}, booktitle={2014 IEEE 28th International Parallel and Distributed Processing Symposium}, title={Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms}, year={2014}, volume={}, number={}, pages={573-582}, doi={10.1109/IPDPS.2014.66}}


Abstract

Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores.


Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Nancy M. Amato, Lawrence Rauchwerger, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 573-582, Phoenix, AZ, USA, May 2014. DOI: 10.1109/IPDPS.2014.66
Keywords: Parallel Algorithms, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6877290, author={A. {Fidel} and S. A. {Jacobs} and S. {Sharma} and N. M. {Amato} and L. {Rauchwerger}}, booktitle={2014 IEEE 28th International Parallel and Distributed Processing Symposium}, title={Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms}, year={2014}, volume={}, number={}, pages={573-582}, doi={10.1109/IPDPS.2014.66}}


Abstract

Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores.


Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Nancy M. Amato, Lawrence Rauchwerger, In. Proc. IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 2014. DOI: 10.1109/IPDPS.2014.66
Keywords: Parallel Algorithms, Parallel Processing, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6877290, author={A. {Fidel} and S. A. {Jacobs} and S. {Sharma} and N. M. {Amato} and L. {Rauchwerger}}, booktitle={2014 IEEE 28th International Parallel and Distributed Processing Symposium}, title={Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms}, year={2014}, volume={}, number={}, pages={573-582}, doi={10.1109/IPDPS.2014.66}}


Abstract

Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores.


A Scalable Framework for Parallelizing Sampling-Based Motion Planning Algorithms, Samson Ade Jacobs, Doctoral dissertation, Texas A & M University, College Station, Texas, USA, May 2014.
Keywords: Parallel Algorithms, Parallel Processing, Sampling-Based Motion Planning
Links : [Published]

BibTex

@phdthesis{jacobs2014scalable,
title={A Scalable Framework for Parallelizing Sampling-Based Motion Planning Algorithms},
author={Jacobs, Samson Ade},
year={2014}
}


Abstract

Motion planning is defined as the problem of finding a valid path taking a robot (or any movable object) from a given start configuration to a goal configuration in an environment. While motion planning has its roots in robotics, it now finds application in many other areas of scientific computing such as protein folding, drug design, virtual prototyping, computer-aided design (CAD), and computer animation. These new areas test the limits of the best sequential planners available, motivating the need for methods that can exploit parallel processing. This dissertation focuses on the design and implementation of a generic and scalable framework for parallelizing motion planning algorithms. In particular, we focus on sampling-based motion planning algorithms which are considered to be the state-of-the-art. Our work covers the two broad classes of sampling-based motion planning algorithms--the graph-based and the tree-based methods. Central to our approach is the subdivision of the planning space into regions. These regions represent sub- problems that can be processed in parallel. Solutions to the sub-problems are later combined to form a solution to the entire problem. By subdividing the planning space and restricting the locality of connection attempts to adjacent regions, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We also describe how load balancing strategies can be applied in complex environments. We present experimental results that scale to thousands of processors on different massively parallel machines for a range of motion planning problems.


Robust Online Belief Space Planning in Changing Environments: Application to Physical Mobile Robots, Ali-akbar Agha-mohammadi, Saurav Agarwal, Aditya Mahadevan, Suman Chakravorty, Daniel Tomkins, Jory Denny, Nancy M. Amato, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 149-156, Hong Kong, China, May 2014. DOI: 10.1109/ICRA.2014.6906602
Keywords: Belief Space, Mobile Robots, Sampling-Based Motion Planning
Links : [Published] [Manuscript]

BibTex

@INPROCEEDINGS{6906602, author={A. {Agha-mohammadi} and S. {Agarwal} and A. {Mahadevan} and S. {Chakravorty} and D. {Tomkins} and J. {Denny} and N. M. {Amato}}, booktitle={2014 IEEE International Conference on Robotics and Automation (ICRA)}, title={Robust online belief space planning in changing environments: Application to physical mobile robots}, year={2014}, volume={}, number={}, pages={149-156}, doi={10.1109/ICRA.2014.6906602}}


Abstract

Motion planning in belief space (under motion and sensing uncertainty) is a challenging problem due to the computational intractability of its exact solution. The Feedback-based Information RoadMap (FIRM) framework made an important theoretical step toward enabling roadmap-based planning in belief space and provided a computationally tractable version of belief space planning. However, there are still challenges in applying belief space planners to physical systems, such as the discrepancy between computational models and real physical models. In this paper, we propose a dynamic replanning scheme in belief space to address such challenges. Moreover, we present techniques to cope with changes in the environment (e.g., changes in the obstacle map), as well as unforeseen large deviations in the robot's location (e.g., the kidnapped robot problem). We then utilize these techniques to implement the first online replanning scheme in belief space on a physical mobile robot that is robust to changes in the environment and large disturbances. This method demonstrates that belief space planning is a practical tool for robot motion planning.


A Scalable Framework for Parallelizing Sampling-Based Motion Planning Algorithms, Sam Ade Jacobs, Ph.D. Thesis, Department of Computer Science and Engineering, Texas A&M University, Texas A&M University, Apr 2014.
Keywords: Parallel Algorithms, Sampling-Based Motion Planning
Links : [Published]

BibTex

N/A


Abstract

Motion planning is defined as the problem of finding a valid path taking a robot (or any movable object) from a given start configuration to a goal configuration in an environment. While motion planning has its roots in robotics, it now finds application in many other areas of scientific computing such as protein folding, drug design, virtual prototyping, computer-aided design (CAD), and computer animation. These new areas test the limits of the best sequential planners available, motivating the need for methods that can exploit parallel processing. This dissertation focuses on the design and implementation of a generic and scalable framework for parallelizing motion planning algorithms. In particular, we focus on sampling-based motion planning algorithms which are considered to be the state-of-the-art. Our work covers the two broad classes of sampling-based motion planning algorithms--the graph-based and the tree-based methods. Central to our approach is the subdivision of the planning space into regions. These regions represent sub- problems that can be processed in parallel. Solutions to the sub-problems are later combined to form a solution to the entire problem. By subdividing the planning space and restricting the locality of connection attempts to adjacent regions, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We also describe how load balancing strategies can be applied in complex environments. We present experimental results that scale to thousands of processors on different massively parallel machines for a range of motion planning problems.


Adaptive Neighbor Connection using Node Characterization , Chinwe Ekenna , Shawna Thomas , Nancy M. Amato , N/A, N/A, Jan 2014.
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@MISC{Ekenna_adaptiveneighbor,
author = {Chinwe Ekenna and Shawna Thomas and Nancy M. Amato},
title = {Adaptive Neighbor Connection using Node Characterization},
year = {}
}


Abstract

Abstract. Sampling-based motion planning has been successful in plan-ning the motion for a wide variety of robot types. An important primitive of these methods involves connecting nodes by selecting candidate neigh-bors and checking the path between them. Recently, an approach called Adaptive Neighbor Connection (ANC) was proposed to automate neigh-bor selection using machine learning techniques. It adaptively selects a neighbor connection strategy from a candidate set of options and learns which strategy to use by examining their success rates and costs. In this work, we extend ANC’s reward function by characterizing the types of nodes added (including information about their connectivity) after each connection attempt. In doing so, we gain insight into the na-ture of these nodes and their ability to improve roadmap quality. We also refine ANC’s cost function by considering the computation time spent during each connection attempt so as to potentially learn faster and more efficient connectors. We show improved performance over selecting a sin-gle connection strategy and over ignoring node characterization during learning. We present results in a variety of 2D and 3D environments and are able to solve queries in half the time on average compared to the best single connection strategy and to the original ANC work. 1


Toggle PRM: A Coordinated Mapping of C-Free and C-Obstacle in Arbitrary Dimension, Jory Denny and Nancy M. Amato, Algorithmic Foundations of Robotics X. Springer Tracts in Advanced Robotics (STAR). Presented at the 2012 Workshop on the Algorithmic Foundations of Robotics (WAFR)., Vol: 86, pp. 297-312, Dec 2013. DOI: 10.1007/978-3-642-36279-8_18
Keywords: Robotics, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{denny-wafr-2012
, author = "Jory Denny and Nancy M. Amato"
, title = "Toggle PRM: A Coordinated Mapping of C-Free and C-Obstacle in Arbitrary Dimension"
, journal = "Algorithmic Foundations of Robotics X. Springer Tracts in Advanced Robotics (STAR)"
, volume = "86"
, publisher = "Springer, Berlin, Heidelberg."
, pages = "297-312"
, year = "2013"
, doi = "doi.org/10.1007/978-3-642-36279-8_18"
, note = "Presented at the 2012 Workshop on the Algorithmic Foundations of Robotics (WAFR)."
}


Abstract

Motion planning has received much attention over the past 40 years. More than 15 years have passed since the introduction of the successful sampling-based approach known as the Probabilistic RoadMap Method (PRM). PRM and its many variants have demonstrated great success for some high-dimensional problems, but they all have some level of difficulty in the presence of narrow passages. Recently, an approach called Toggle PRM has been introduced whose performance does not degrade for 2-dimensional problems with narrow passages. In Toggle PRM, a simultaneous, coordinated mapping of both C-free and C-obst is performed and every connection attempt augments one of the maps – either validating an edge in the current space or adding a configuration ’witnessing’ the connection failure to the other space. In this paper, we generalize Toggle PRM to d-dimensions and show that the benefits of mapping both C-free and C-obst continue to hold in higher dimensions. In particular, we introduce a new narrow passage characterization, alpha-epsilon-separable narrow passages, which describes the types of passages that can be successfully mapped by Toggle PRM. Intuitively,alpha-epsilson-separable narrow passages are arbitrarily narrow regions of C-free that separate regions of C-obst , at least locally, such as hallways in an office building. We experimentally compare Toggle PRM with other methods in a variety of scenarios with different types of narrow passages and robots with up to 16 dof.


FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements, Ali-akbar Agha-mohammadi, Suman Chakravorty, Nancy M. Amato, International Journal of Robotics Research (ijrr), Vol: 33, Issue: 2, pp. 268-304, Nov 2013. DOI: 10.1177/0278364913501564
Keywords: Belief Space, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{doi:10.1177/0278364913501564,
author = {Ali-akbar Agha-mohammadi and Suman Chakravorty and Nancy M Amato},
title ={FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements},
journal = {The International Journal of Robotics Research},
volume = {33},
number = {2},
pages = {268-304},
year = {2014},
doi = {10.1177/0278364913501564},

URL = {
https://doi.org/10.1177/0278364913501564

},
eprint = {
https://doi.org/10.1177/0278364913501564

}
,
abstract = { In this paper we present feedback-based information roadmap (FIRM), a multi-query approach for planning under uncertainty which is a belief-space variant of probabilistic roadmap methods. The crucial feature of FIRM is that the costs associated with the edges are independent of each other, and in this sense it is the first method that generates a graph in belief space that preserves the optimal substructure property. From a practical point of view, FIRM is a robust and reliable planning framework. It is robust since the solution is a feedback and there is no need for expensive replanning. It is reliable because accurate collision probabilities can be computed along the edges. In addition, FIRM is a scalable framework, where the complexity of planning with FIRM is a constant multiplier of the complexity of planning with PRM. In this paper, FIRM is introduced as an abstract framework. As a concrete instantiation of FIRM, we adopt stationary linear quadratic Gaussian (SLQG) controllers as belief stabilizers and introduce the so-called SLQG-FIRM. In SLQG-FIRM we focus on kinematic systems and then extend to dynamical systems by sampling in the equilibrium space. We investigate the performance of SLQG-FIRM in different scenarios. }
}


Abstract

In this paper we present feedback-based information roadmap (FIRM), a multi-query approach for planning under uncertainty which is a belief-space variant of probabilistic roadmap methods. The crucial feature of FIRM is that the costs associated with the edges are independent of each other, and in this sense it is the first method that generates a graph in belief space that preserves the optimal substructure property. From a practical point of view, FIRM is a robust and reliable planning framework. It is robust since the solution is a feedback and there is no need for expensive replanning. It is reliable because accurate collision probabilities can be computed along the edges. In addition, FIRM is a scalable framework, where the complexity of planning with FIRM is a constant multiplier of the complexity of planning with PRM. In this paper, FIRM is introduced as an abstract framework. As a concrete instantiation of FIRM, we adopt stationary linear quadratic Gaussian (SLQG) controllers as belief stabilizers and introduce the so-called SLQG-FIRM. In SLQG-FIRM we focus on kinematic systems and then extend to dynamical systems by sampling in the equilibrium space. We investigate the performance of SLQG-FIRM in different scenarios.


Adaptive Neighbor Connection for PRMs: A Natural Fit for Heterogeneous Environments and Parallelism, Chinwe Ekenna, Sam Ade Jacobs, Shawna Thomas, Nancy M. Amato, In. Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, Nov 2013. DOI: 10.1109/IROS.2013.6696510
Keywords: Parallel Algorithms, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6696510, author={C. {Ekenna} and S. A. {Jacobs} and S. {Thomas} and N. M. {Amato}}, booktitle={2013 IEEE/RSJ International Conference on Intelligent Robots and Systems}, title={Adaptive neighbor connection for PRMs: A natural fit for heterogeneous environments and parallelism}, year={2013}, volume={}, number={}, pages={1249-1256}, doi={10.1109/IROS.2013.6696510}}


Abstract

Probabilistic Roadmap Methods (PRMs) are widely used motion planning methods that sample robot configurations (nodes) and connect them to form a graph (roadmap) containing feasible trajectories. Many PRM variants propose different strategies for each of the steps and choosing among them is problem dependent. Planning in heterogeneous environments and/or on parallel machines necessitates dividing the problem into regions where these choices have to be made for each one. Hand-selecting the best method for each region becomes infeasible. In particular, there are many ways to select connection candidates, and choosing the appropriate strategy is input dependent. In this paper, we present a general connection framework that adaptively selects a neighbor finding strategy from a candidate set of options. Our framework learns which strategy to use by examining their success rates and costs. It frees the user of the burden of selecting the best strategy and allows the selection to change over time. We perform experiments on rigid bodies of varying geometry and articulated linkages up to 37 degrees of freedom. Our results show that strategy performance is indeed problem/region dependent, and our adaptive method harnesses their strengths. Over all problems studied, our method differs the least from manual selection of the best method, and if one were to manually select a single method across all problems, the performance can be quite poor. Our method is able to adapt to changing sampling density and learns different strategies for each region when the problem is partitioned for parallelism.


Blind RRT: A Probabilistically Complete Distributed RRT, Cesar Rodriguez, Jory Denny, Sam Ade Jacobs, Shawna Thomas, Nancy M. Amato, In. Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, Nov 2013. DOI: 10.1109/IROS.2013.6696587
Keywords: Parallel Algorithms, Rapidly-exploring Random Tree (RRT), Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6696587, author={C. {Rodriguez} and J. {Denny} and S. A. {Jacobs} and S. {Thomas} and N. M. {Amato}}, booktitle={2013 IEEE/RSJ International Conference on Intelligent Robots and Systems}, title={Blind RRT: A probabilistically complete distributed RRT}, year={2013}, volume={}, number={}, pages={1758-1765}, doi={10.1109/IROS.2013.6696587}}


Abstract

Rapidly-Exploring Random Trees (RRTs) have been successful at finding feasible solutions for many types of problems. With motion planning becoming more computationally demanding, we turn to parallel motion planning for efficient solutions. Existing work on distributed RRTs has been limited by the overhead that global communication requires. A recent approach, Radial RRT, demonstrated a scalable algorithm that subdivides the space into regions to increase the computation locality. However, if an obstacle completely blocks RRT growth in a region, the planning space is not covered and is thus not probabilistically complete. We present a new algorithm, Blind RRT, which ignores obstacles during initial growth to efficiently explore the entire space. Because obstacles are ignored, free components of the tree become disconnected and fragmented. Blind RRT merges parts of the tree that have become disconnected from the root. We show how this algorithm can be applied to the Radial RRT framework allowing both scalability and effectiveness in motion planning. This method is a probabilistically complete approach to parallel RRTs. We show that our method not only scales but also overcomes the motion planning limitations that Radial RRT has in a series of difficult motion planning tasks.


Adapting RRT Growth for Heterogeneous Environments, Jory Denny, Marco Morales, Samuel Rodriguez, Nancy M. Amato, In. Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, Nov 2013. DOI: 10.1109/IROS.2013.6696589
Keywords: Adaptive Algorithm, Rapidly-exploring Random Tree (RRT), Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6696589, author={J. {Denny} and M. {Morales} and S. {Rodriguez} and N. M. {Amato}}, booktitle={2013 IEEE/RSJ International Conference on Intelligent Robots and Systems}, title={Adapting RRT growth for heterogeneous environments}, year={2013}, volume={}, number={}, pages={1772-1778}, doi={10.1109/IROS.2013.6696589}}


Abstract

Rapidly-exploring Random Trees (RRTs) are effective for a wide range of applications ranging from kinodynamic planning to motion planning under uncertainty. However, RRTs are not as efficient when exploring heterogeneous environments and do not adapt to the space. For example, in difficult areas an expensive RRT growth method might be appropriate, while in open areas inexpensive growth methods should be chosen. In this paper, we present a novel algorithm, Adaptive RRT, that adapts RRT growth to the current exploration area using a two level growth selection mechanism. At the first level, we select groups of expansion methods according to the visibility of the node being expanded. Second, we use a cost-sensitive learning approach to select a sampler from the group of expansion methods chosen. Also, we propose a novel definition of visibility for RRT nodes which can be computed in an online manner and used by Adaptive RRT to select an appropriate expansion method. We present the algorithm and experimental analysis on a broad range of problems showing not only its adaptability, but efficiency gains achieved by adapting exploration methods appropriately.


Rigidity Analysis for Protein Motion and Folding Core Identification, Shawna Thomas, Lydia Tapia, Chinwe Ekenna, Hsin-Yi (Cindy) Yeh, Nancy M. Amato, Association for the Advancement of Artificial Intelligence (AAAI) Workshop, Vol: WS-13-06, pp. 38-43, Bellevue, Washington, Jul 2013.
Keywords: Computational Biology, Protein Folding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{thomas-aaaiW-2013
, author = "Shawna Thomas and Lydia Tapia and Chinwe Ekenna and Hsin-Yi (Cindy) Yeh and Nancy M. Amato "
, title = "Rigidity Analysis for Protein Motion and Folding Core Identification"
, booktitle = "AAAI Workshop on Artificial Intelligence and Robotics Methods in Computational Biology"
, volume = "WS-13-06"
, month = "July"
, address = "Bellevue, Washington, USA"
, pages = "38-43"
, year = "2013"
}


Abstract

Protein folding plays an essential role in protein function and stability. Despite the explosion in our knowledge of structural and functional data, our understanding of protein folding is still very limited. In addition, methods such as folding core identification are gaining importance with the increased desire to engineer proteins with particular functions and efficiencies. However, defining the folding core can be challenging for both experiment and simulation. In this work, we use rigidity analysis to effectively sample and model the protein's energy landscape and identify the folding core. Our results show that rigidity analysis improves the accuracy of our approximate landscape models and produces landscape models that capture the subtle folding differences between protein G and its mutants, NuG1 and NuG2. We then validate our folding core identification against known experimental data and compare to other simulation tools. In addition to correlating well with experiment, our method can suggest other components of structure that have not been identified as part of the core because they were not previously measured experimentally.


Graph-based Stochastic Control with Constraints: A Unified Approach with Perfect and Imperfect Measurements, Ali-akbar Agha-mohammadi, Suman Chakravorty, Nancy M. Amato, In. Proc. American Control Conference, Washington, DC, USA, Jun 2013. DOI: 10.1109/ACC.2013.6580545
Keywords: Belief Space, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6580545, author={A. {Agha-mohammadi} and S. {Chakravorty} and N. M. {Amato}}, booktitle={2013 American Control Conference}, title={Graph-based stochastic control with constraints: A unified approach with perfect and imperfect measurements}, year={2013}, volume={}, number={}, pages={4581-4586}, doi={10.1109/ACC.2013.6580545}}


Abstract

This paper is concerned with the problem of stochastic optimal control (possibly with imperfect measurements) in the presence of constraints. We propose a computationally tractable framework to address this problem. The method lends itself to sampling-based methods where we construct a graph in the state space of the problem, on which a Dynamic Programming (DP) is solved and a closed-loop feedback policy is computed. The constraints are seamlessly incorporated to the control policy selection by including their effect on the transition probabilities of the graph edges. We present a unified framework that is applicable both in the state space (with perfect measurements) and in the information space (with imperfect measurements).


A scalable distributed RRT for motion planning, Sam Ade Jacobs, Nicholas Stradford, Cesar Rodriguez, Shawna Thomas, Nancy M. Amato, 2013 IEEE International Conference on Robotics and Automation, pp. 5088-5095, Karlsruhe, Germany, May 2013. DOI: 10.1109/ICRA.2013.6631304
Keywords: Rapidly-exploring Random Tree (RRT), Sampling-Based Motion Planning
Links :

BibTex

@INPROCEEDINGS{6631304,
author={S. A. {Jacobs} and N. {Stradford} and C. {Rodriguez} and S. {Thomas} and N. M. {Amato}},
booktitle={2013 IEEE International Conference on Robotics and Automation},
title={A scalable distributed RRT for motion planning},
year={2013},
volume={},
number={},
pages={5088-5095},
doi={10.1109/ICRA.2013.6631304}}


Abstract

Rapidly-exploring Random Tree (RRT), like other sampling-based motion planning methods, has been very successful in solving motion planning problems. Even so, sampling-based planners cannot solve all problems of interest efficiently, so attention is increasingly turning to parallelizing them. However, one challenge in parallelizing RRT is the global computation and communication overhead of nearest neighbor search, a key operation in RRTs. This is a critical issue as it limits the scalability of previous algorithms. We present two parallel algorithms to address this problem. The first algorithm extends existing work by introducing a parameter that adjusts how much local computation is done before a global update. The second algorithm radially subdivides the configuration space into regions, constructs a portion of the tree in each region in parallel, and connects the subtrees,i removing cycles if they exist. By subdividing the space, we increase computation locality enabling a scalable result. We show that our approaches are scalable. We present results demonstrating almost linear scaling to hundreds of processors on a Linux cluster and a Cray XE6 machine.


A Scalable Distributed RRT for Motion Planning, Sam Ade Jacobs, Nicholas Stradford, Cesar Rodriguez, Shawna Thomas, Nancy M. Amato, 2013 IEEE International Conference on Robotics and Automation, pp. 5088-5095, Karlsruhe, Germany, May 2013. DOI: 10.1109/ICRA.2013.6631304
Keywords: Parallel Algorithms, Path Planning, Rapidly-exploring Random Tree (RRT)
Links : [Published]

BibTex

@INPROCEEDINGS{6631304,
author={S. A. {Jacobs} and N. {Stradford} and C. {Rodriguez} and S. {Thomas} and N. M. {Amato}},
booktitle={2013 IEEE International Conference on Robotics and Automation},
title={A scalable distributed RRT for motion planning},
year={2013},
volume={},
number={},
pages={5088-5095},
doi={10.1109/ICRA.2013.6631304}}


Abstract

Rapidly-exploring Random Tree (RRT), like other sampling-based motion planning methods, has been very successful in solving motion planning problems. Even so, sampling-based planners cannot solve all problems of interest efficiently, so attention is increasingly turning to parallelizing them. However, one challenge in parallelizing RRT is the global computation and communication overhead of nearest neighbor search, a key operation in RRTs. This is a critical issue as it limits the scalability of previous algorithms. We present two parallel algorithms to address this problem. The first algorithm extends existing work by introducing a parameter that adjusts how much local computation is done before a global update. The second algorithm radially subdivides the configuration space into regions, constructs a portion of the tree in each region in parallel, and connects the subtrees,i removing cycles if they exist. By subdividing the space, we increase computation locality enabling a scalable result. We show that our approaches are scalable. We present results demonstrating almost linear scaling to hundreds of processors on a Linux cluster and a Cray XE6 machine.


Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory, Roger Pearce, Maya Gokhale, Nancy M. Amato, IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 825-836, Cambridge, Massachusetts, USA, May 2013. DOI: 10.1109/IPDPS.2013.72
Keywords: Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{DBLP:conf/ipps/PearceGA13,
author = {Roger A. Pearce and
Maya B. Gokhale and
Nancy M. Amato},
title = {Scaling Techniques for Massive Scale-Free Graphs in Distributed (External)
Memory},
booktitle = {27th {IEEE} International Symposium on Parallel and Distributed Processing,
{IPDPS} 2013, Cambridge, MA, USA, May 20-24, 2013},
pages = {825--836},
publisher = {{IEEE} Computer Society},
year = {2013},
url = {https://doi.org/10.1109/IPDPS.2013.72},
doi = {10.1109/IPDPS.2013.72},
timestamp = {Mon, 20 Apr 2020 17:14:44 +0200},
biburl = {https://dblp.org/rec/conf/ipps/PearceGA13.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results [1]. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS).


Lazy Toggle PRM: A Single-Query Approach to Motion Planning, Jory Denny, Kensen Shi, and Nancy M. Amato, Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2407-2414, Karlsruhe, Germany, May 2013. DOI: 10.1109/ICRA.2013.6630904
Keywords: Robotics, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{denny-icra-2013
, author = "Jory Denny and Kensen Shi and Nancy M. Amato"
, title = "Lazy Toggle PRM: A single-query approach to motion planning"
, booktitle = "IEEE International Conference on Robotics and Automation"
, location = "Karlsruhe, Germany"
, month = "May"
, year - "2013"
, pages = "2407-2414"
, doi = "10.1109/ICRA.2013.6630904"
}


Abstract

Probabilistic RoadMaps (PRMs) are quite successful in solving complex and high-dimensional motion planning problems. While particularly suited for multiple-query scenarios and expansive spaces, they lack efficiency in both solving single-query scenarios and mapping narrow spaces. Two PRM variants separately tackle these gaps. Lazy PRM reduces the computational cost of roadmap construction for single-query scenarios by delaying roadmap validation until query time. Toggle PRM is well suited for mapping narrow spaces by mapping both C free and C obst , which gives certain theoretical benefits. However, fully validating the two resulting roadmaps can be costly. We present a strategy, Lazy Toggle PRM, for integrating these two approaches into a method which is both suited for narrow passages and efficient single-query calculations. This simultaneously addresses two challenges of PRMs. Like Lazy PRM, Lazy Toggle PRM delays validation of roadmaps until query time, but if no path is found, the algorithm augments the roadmap using the Toggle PRM methodology. We demonstrate the effectiveness of Lazy Toggle PRM in a wide range of scenarios, including those with narrow passages and high descriptive complexity (e.g., those described by many triangles), concluding that it is more effective than existing methods in solving difficult queries.


Improving Roadmap Quality through Connected Component Expansion, Juan Burgos, Jory Denny, Nancy M. Amato, Technical Report, TR13-003, Texas A&M University, Apr 2013.
Keywords: Sampling-Based Motion Planning
Links : [Manuscript]

BibTex

N/A


Abstract

Motion planning is the problem of computing valid paths through an environment. However, because computing exact solutions is intractable, sampling-based algorithms, such as Probabilistic RoadMaps (PRMs), have gained popularity. PRMs compute an approximate mapping of the planning space by sacrificing completeness in favor of efficiency. However, these algorithms have certain bottlenecks that hinder performance which causes difficulty mapping narrow or crowded regions, and the cost of nearest-neighbor queries, which is the asymptotic bottleneck of these algorithms. Thus, roadmaps may fail to efficiently capture the connectivity of the planning space. In this paper, we present a set of connected component (CC) expansion algorithms, each with different biases (random expand, expand to the nearest CC, expand away from the host CC, and expand along the medial-axis) and expansion node selection policies (random, farthest from CC’s centroid, and difficult nodes), that create a linked-chain of configurations designed to enable efficient connection of CCs. Given an a priori roadmap quality requirement in terms of roadmap connectivity, we show that when our expansion methods augment PRMs, we reach the required roadmap connectivity in less time.


Load Balancing Techniques for Scalable Parallelization of Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Lawrence Rauchwerger, Nancy M. Amato, Technical Report, TR13-002 , Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, Mar 2013.
Keywords: Parallel Algorithms, Sampling-Based Motion Planning, STAPL
Links : [Manuscript]

BibTex

@techreport{afidel-tr13-002-2013,
title = "Load Balancing Techniques for Scalable Parallelization of Sampling-Based Motion Planning Algorithms",
author = "Fidel, Adam and Jacobs, Sam Ade and Sharma, Shishir and Rauchwerger, Lawrence and Amato, Nancy M. ",
institution = "Texas A&M University",
address = "College Station, TX, USA",
number = "TR13-002",
year = 2013,
month = mar
}


Abstract

Motion planning, which is the problem of computing feasible paths through an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, particularly as the number of processors is increased, because planning time depends on region characteristics and, for most problems, the heterogeneity of the set of regions increases as the region size decreases. In this work, we introduce two techniques to address the problems of load imbalance in the parallelization of sampling-based motion planning algorithms: bulk-synchronous redistribution and an adaptive workstealing approach. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on 3,000+ cores.


Efficient Massively Parallel Transport Sweeps, W. Daryl Hawkins, Timmie Smith, Michael P. Adams, Lawrence Rauchwerger, Nancy Amato, Marvin L. Adams, Transactions of the American Nuclear Society, Vol: 107, Issue: 1, pp. 477-481, San Diego, CA, USA, Nov 2012.
Keywords: Parallel Algorithms, Scientific Computing, STAPL
Links : [Published]

BibTex

@article{hawkins-tans-2012
, author = "W. Daryl Hawkins and Timmie Smith and Michael P. Adams and Lawrence Rauchwerger and Nancy Amato and Marvin L. Adams"
, title = "Efficient Massively Parallel Transport Sweeps"
, journal = "Trans. American Nuclear Society"
, volume = "107"
, number = "1"
, month = "November"
, year = "2012"
, pages = "477-481"
, issn = "0003-018X"
, note = "2012 ANS Annual Winter Meeting ; Conference date: 11-11-2012 Through 15-11-2012",
}


Abstract

The full-domain “sweep,” in which all angular fluxes in a problem are calculated given previous-iterate values only for the volumetric source, forms the foundation for many iterative methods that have desirable properties. One important property is that iteration counts do not grow with mesh refinement. The sweep solution on parallel machines is complicated by the dependency of a given cell on its upstream neighbors. A sweep algorithm is defined by its partitioning (dividing the domain among processors), aggregation (grouping cells, directions, and energy groups into “tasks”), and scheduling (choosing which task to execute if more than one is available). The work presented here follows that of Bailey and Falgout, who theoretically and computationally evaluated the performance of three sweep algorithms. Their “data-driven” schedule appeared to be optimal—executing the sweep in the minimum possible number of stages—for tested partitionings and aggregations, but they were unable to prove this mathematically and they did not attempt to optimize across possible partitionings and aggregations. Here we consider 3D Cartesian grids of Nx × Ny × Nz spatial cells and simple Px × Py × Pz partitioning, and we permit general aggregation of cells, directions, and energy groups. We have found a provably optimal family of scheduling algorithms and we present results from one of these. We exploit our guaranteed minimum stage count to further optimize sweep-execution time by choosing the best possible partitioning and aggregation parameters. Our results show excellent scaling out to 32,768 cores, significantly better than results previously reported for sweeps and in line with the optimistic projections.


A Multi-Directional Rapidly Exploring Random Graph (mRRG) for Protein Folding, Shuvra Nath, Shawna Thomas, Chinwe Ekenna, Nancy M. Amato, Proc. ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB), pp. 44-51, Orlando, FL, USA, Oct 2012. DOI: 10.1145/2382936.2382942
Keywords: Computational Biology, Protein Folding
Links : [Published]

BibTex

@inproceedings{nath-bcb-2012
, author = "Shuvra Nath and Shawna Thomas and Chinwe Ekenna and Nancy M. Amato"
, title = "A Multi-Directional Rapidly Exploring Random Graph (mRRG) for Protein Folding"
, booktitle = "ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB)"
, pages = "44-51"
, location ="Orlando, FL, USA"
, month = "October"
, year = "2012"
, doi = "10.1145/2382936.2382942"
}


Abstract

Modeling large-scale protein motions, such as those involved in folding and binding interactions, is crucial to better understanding not only how proteins move and interact with other molecules but also how proteins misfold, thus causing many devastating diseases. Robotic motion planning algorithms, such as Rapidly Exploring Random Trees (RRTs), have been successful in simulating protein folding pathways. Here, we propose a new multi-directional Rapidly Exploring Random Graph (mRRG) specifically tailored for proteins. Unlike traditional RRGs which only expand a parent conformation in a single direction, our strategy expands the parent conformation in multiple directions to generate new samples. Resulting samples are connected to the parent conformation and its nearest neighbors. By leveraging multiple directions, mRRG can model the protein motion landscape with reduced computational time compared to several other robotics-based methods for small to moderate-sized proteins. Our results on several proteins agree with experimental hydrogen out-exchange, pulse-labeling, and Φ-value analysis. We also show that mRRG covers the conformation space better as compared to the other computation methods.


Sampling-based nonholonomic motion planning in belief space via Dynamic Feedback Linearization-based FIRM, Ali-akbar Agha-mohammadi, Suman Chakravorty, Nancy M. Amato, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2012. DOI: 10.1109/IROS.2012.6385970
Keywords: Belief Space, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6385970,

author={A. {Agha-mohammadi} and S. {Chakravorty} and N. M. {Amato}},

booktitle={2012 IEEE/RSJ International Conference on Intelligent Robots and Systems},

title={Sampling-based nonholonomic motion planning in belief space via Dynamic Feedback Linearization-based FIRM},

year={2012},

volume={},

number={},

pages={4433-4440},

doi={10.1109/IROS.2012.6385970}}


Abstract

In roadmap-based methods, such as the Probabilistic Roadmap Method (PRM) in deterministic environments or the Feedback-based Information RoadMap (FIRM) in partially observable probabilistic environments, a stabilizing controller is needed to guarantee node reachability in state or belief space. In belief space, it has been shown that belief-node reachability can be achieved using stationary Linear Quadratic Gaussian (LQG) controllers, for linearly controllable systems. However, for nonholonomic systems such as a unicycle model, belief reachability is a challenge. In this paper, we construct a roadmap in information space, where the local planners in partially-observable space are constructed by utilizing a Kalman filter as an estimator along with a Dynamic Feedback Linearization-based (DFL-based) controller as the belief controller. As a consequence, the task of belief stabilization to pre-defined nodes in belief space is accomplished even for nonholonomic systems. Therefore, a query-independent roadmap is generated in belief space that preserves the “principle of optimality”, required in dynamic programming solvers. This method serves as an offline POMDP solver for motion planning in belief space, which can seamlessly take obstacles into account. Experimental results show the efficiency of both individual local planners and the overall planner over the information graph for a nonholonomic model.


The STAPL Parallel Graph Library, Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC), pp. 46-60, Tokyo, Japan, Sep 2012. DOI: 10.1007/978-3-642-37658-0_4
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-642-37658-0_4,
author=\"Harshvardhan
and Fidel, Adam
and Amato, Nancy M.
and Rauchwerger, Lawrence\",
editor=\"Kasahara, Hironori
and Kimura, Keiji\",
title=\"The STAPL Parallel Graph Library\",
booktitle=\"Languages and Compilers for Parallel Computing\",
year=\"2013\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"46--60\",
abstract=\"This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraphpViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges.\",
isbn=\"978-3-642-37658-0\"
}


Abstract

This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges.


Quantifying the effectiveness of load balance algorithms, Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato, International Conference on Supercomputing (ICS), pp. 185-194, Venice, Italy, Jun 2012. DOI: 10.1145/2304576.2304601
Keywords: Adaptive Algorithm
Links : [Published]

BibTex

@inproceedings{DBLP:conf/ics/PearceGSSA12,
author = {Olga Pearce and
Todd Gamblin and
Bronis R. de Supinski and
Martin Schulz and
Nancy M. Amato},
editor = {Utpal Banerjee and
Kyle A. Gallivan and
Gianfranco Bilardi and
Manolis Katevenis},
title = {Quantifying the effectiveness of load balance algorithms},
booktitle = {International Conference on Supercomputing, ICS'12, Venice, Italy,
June 25-29, 2012},
pages = {185--194},
publisher = {{ACM}},
year = {2012},
url = {https://doi.org/10.1145/2304576.2304601},
doi = {10.1145/2304576.2304601},
timestamp = {Tue, 06 Nov 2018 11:07:02 +0100},
biburl = {https://dblp.org/rec/conf/ics/PearceGSSA12.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

Load balance is critical for performance in large parallel applications. An imbalance on today's fastest supercomputers can force hundreds of thousands of cores to idle, and on future exascale machines this cost will increase by over a factor of a thousand. Improving load balance requires a detailed understanding of the amount of computational load per process and an application's simulated domain, but no existing metrics sufficiently account for both factors. Current load balance mechanisms are often integrated into applications and make implicit assumptions about the load. Some strategies place the burden of providing accurate load information, including the decision on when to balance, on the application. Existing application-independent mechanisms simply measure the application load without any knowledge of application elements, which limits them to identifying imbalance without correcting it. Our novel load model couples abstract application information with scalable measurements to derive accurate and actionable load metrics. Using these metrics, we develop a cost model for correcting load imbalance. Our model enables comparisons of the effectiveness of load balancing algorithms in any specific imbalance scenario. Our model correctly selects the algorithm that achieves the lowest runtime in up to 96% of the cases, and can achieve a 19% gain over selecting a single balancing algorithm for all cases.


Logical inference techniques for loop parallelization, Cosmin E. Oancea, Lawrence Rauchwerger, ACM Conference on Programming Language Design and Implementation (PLDI), pp. 509-520, Beijing, China, Jun 2012. DOI: 10.1145/2254064.2254124
Keywords: Parallel Programming
Links : [Published]

BibTex

@inproceedings{DBLP:conf/pldi/OanceaR12,
author = {Cosmin E. Oancea and
Lawrence Rauchwerger},
editor = {Jan Vitek and
Haibo Lin and
Frank Tip},
title = {Logical inference techniques for loop parallelization},
booktitle = {{ACM} {SIGPLAN} Conference on Programming Language Design and Implementation,
{PLDI} '12, Beijing, China - June 11 - 16, 2012},
pages = {509--520},
publisher = {{ACM}},
year = {2012},
url = {https://doi.org/10.1145/2254064.2254124},
doi = {10.1145/2254064.2254124},
timestamp = {Tue, 06 Nov 2018 16:59:30 +0100},
biburl = {https://dblp.org/rec/conf/pldi/OanceaR12.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S=0, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ==> S = 0). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-Club and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers.


A Scalable Method for Parallelizing Sampling-Based Motion Planning Algorithms, Sam Ade Jacobs, Kasra Manavi, Juan Burgos, Jory Denny, Shawna Thomas, Nancy M. Amato, 2012 IEEE International Conference on Robotics and Automation, pp. 2529-2536, Saint Paul, MN, USA, May 2012. DOI: 10.1109/ICRA.2012.6225334
Keywords: Parallel Algorithms, Path Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6225334,
author={S. A. {Jacobs} and K. {Manavi} and J. {Burgos} and J. {Denny} and S. {Thomas} and N. M. {Amato}},
booktitle={2012 IEEE International Conference on Robotics and Automation},
title={A scalable method for parallelizing sampling-based motion planning algorithms},
year={2012},
volume={},
number={},
pages={2529-2536},
doi={10.1109/ICRA.2012.6225334}}


Abstract

This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine.


On the Probabilistic Completeness of the Sampling-based Feedback Motion Planners in Belief Space, Ali-akbar Agha-mohammadi, Suman Chakravorty, Nancy M. Amato, IEEE International Conference on Robotics and Automation (ICRA), May 2012. DOI: 10.1109/ICRA.2012.6225062
Keywords: Belief Space, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{6225062,

author={A. {Agha-mohammadi} and S. {Chakravorty} and N. M. {Amato}},

booktitle={2012 IEEE International Conference on Robotics and Automation},

title={On the probabilistic completeness of the sampling-based feedback motion planners in belief space},

year={2012},

volume={},

number={},

pages={3983-3990},

doi={10.1109/ICRA.2012.6225062}}


Abstract

This paper extends the concept of “probabilistic completeness” defined for motion planners in state space (or configuration space) to the concept of “probabilistic completeness under uncertainty” for motion planners in belief space. Accordingly, an approach is proposed to verify the probabilistic completeness of the sampling-based planners in belief space. Finally, through the proposed approach, it is shown that under mild conditions the sampling-based methods constructed based on the abstract framework of FIRM (Feedback-based Information Roadmap Method) are probabilistically complete under uncertainty.


The Toggle Local Planner for sampling-based motion planning, Jory Denny and Nancy M. Amato, Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 1779-1786, St. Paul, MN, USA, May 2012. DOI: 10.1109/ICRA.2012.6225212
Keywords: Robotics, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{denny-icra-2012
, author = "Jory Denny and Nancy M. Amato"
, title = "The Toggle Local Planner for sampling-based motion planning"
, booktitle = "IEEE International Conference on Robotics and Automation (ICRA)"
, location = "Saint Paul, MN, USA"
, month = "May"
, year - "2012"
, pages = "1779-1786"
, doi = "10.1109/ICRA.2012.6225212"
}


Abstract

Sampling-based solutions to the motion planning problem, such as the probabilistic roadmap method (PRM), have become commonplace in robotics applications. These solutions are the norm as the dimensionality of the planning space grows, i.e., d >; 5. An important primitive of these methods is the local planner, which is used for validation of simple paths between two configurations. The most common is the straight-line local planner which interpolates along the straight line between the two configurations. In this paper, we introduce a new local planner, Toggle Local Planner (Toggle LP), which extends local planning to a two-dimensional subspace of the overall planning space. If no path exists between the two configurations in the subspace, then Toggle LP is guaranteed to correctly return false. Intuitively, more connections could be found by Toggle LP than by the straight-line planner, resulting in better connected roadmaps. As shown in our results, this is the case, and additionally, the extra cost, in terms of time or storage, for Toggle LP is minimal. Additionally, our experimental analysis of the planner shows the benefit for a wide array of robots, with DOF as high as 70.


From Days to Seconds: Scalable Parallel Algorithms for Motion Planning, Sam Ade Jacobs, Nancy M. Amato, In ACM Student Research Compet, Conf. on High Performance Computing Networking, Storage and Analysis Companion Proceedings, Seattle, Washington, USA, Nov 2011.
Keywords: High-Performance Computing, Robotics, Sampling-Based Motion Planning
Links : [Published]

BibTex

N/A


Abstract

This work describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine.


Toggle PRM: Simultaneous mapping of C-free and C-obstacle - A study in 2D, Jory Denny and Nancy M. Amato, Proc. IEEE/RSJ Int. Conf. Intel. Rob. Syst. (IROS), pp. 2632-2639, Sep 2011. DOI: 10.1109/IROS.2011.6095102
Keywords: Robotics, Sampling-Based Motion Planning
Links :

BibTex

@proceedings{denny-iros-2011
, author = "Jory Denny and Nancy Amato"
, title = "Toggle PRM: Simultaneous mapping of C-free and C-obstacle - A study in 2D"
, booktitle = "Proc. IEEE/RSJ Int. Conf. Intel. Rob. Syst. (IROS)"
, location = "San Francisco, CA, USA"
, month = "September"
, year = "2011"
, pages = "2632-2639"
, doi = "10.1109/IROS.2011.6095102"
}


Abstract

Motion planning is known to be difficult. Probabilistic planners have made great advances, but still have difficulty for problems that require planning in narrow passages or on surfaces in C space . This work proposes Toggle PRM, a new methodology for PRMs that simultaneously maps both free and obstacle space. In this paper, we focus on 2 DOF problems and show that mapping both spaces leads to increased sampling density in narrow passages and to improved overall efficiency as compared to previous sampling based approaches.


FIRM: Feedback controller-based information-state roadmap - A framework for motion planning under uncertainty, Ali-akbar Agha-mohammadi, Suman Chakravorty, Nancy M. Amato, IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, Sep 2011. DOI: 10.1109/IROS.2011.6095010
Keywords: Algorithms, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{agha2011firm,
title={FIRM: Feedback controller-based Information-state RoadMap-a framework for motion planning under uncertainty},
author={Agha-Mohammadi, Ali-Akbar and Chakravorty, Suman and Amato, Nancy M},
booktitle={2011 IEEE/RSJ International Conference on Intelligent Robots and Systems},
pages={4284--4291},
year={2011},
organization={IEEE}
}


Abstract

Direct transformation of sampling-based motion planning methods to the Information-state (belief) space is a challenge. The main bottleneck for roadmap-based techniques in belief space is that the incurred costs on different edges of the graph are not independent of each other. In this paper, we generalize the Probabilistic RoadMap (PRM) framework to obtain a Feedback controller-based Information-state RoadMap (FIRM) that takes into account motion and sensing uncertainty in planning. The FIRM nodes and edges lie in belief space and the crucial feature of FIRM is that the costs associated with different edges of FIRM are independent of each other. Therefore, this construct essentially breaks the “curse of history” in the original Partially Observable Markov Decision Process (POMDP), which models the planning problem. Further, we show how obstacles can be rigorously incorporated into planning on FIRM. All these properties stem from utilizing feedback controllers in the construction of FIRM.


A Hybrid Approach to Proving Memory Reference Monotonicity, Cosmin E. Oancea, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing, 24th International Workshop (LCPC), Vol: 7146, pp. 61-75, Fort Collins, Colorado, USA, Sep 2011. DOI: 10.1007/978-3-642-36036-7\_5
Keywords: Parallel Programming
Links : [Published]

BibTex

@inproceedings{DBLP:conf/lcpc/OanceaR11,
author = {Cosmin E. Oancea and
Lawrence Rauchwerger},
editor = {Sanjay V. Rajopadhye and
Michelle Mills Strout},
title = {A Hybrid Approach to Proving Memory Reference Monotonicity},
booktitle = {Languages and Compilers for Parallel Computing, 24th International
Workshop, {LCPC} 2011, Fort Collins, CO, USA, September 8-10, 2011.
Revised Selected Papers},
series = {Lecture Notes in Computer Science},
volume = {7146},
pages = {61--75},
publisher = {Springer},
year = {2011},
url = {https://doi.org/10.1007/978-3-642-36036-7\_5},
doi = {10.1007/978-3-642-36036-7\_5},
timestamp = {Tue, 14 May 2019 10:00:47 +0200},
biburl = {https://dblp.org/rec/conf/lcpc/OanceaR11.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

Array references indexed by non-linear expressions or subscript arrays represent a major obstacle to compiler analysis and to automatic parallelization. Most previous proposed solutions either enhance the static analysis repertoire to recognize more patterns, to infer array-value properties, and to refine the mathematical support, or apply expensive run time analysis of memory reference traces to disambiguate these accesses. This paper presents an automated solution based on static construction of access summaries, in which the reference non-linearity problem can be solved for a large number of reference patterns by extracting arbitrarily-shaped predicates that can (in)validate the reference monotonicity property and thus (dis)prove loop independence. Experiments on six benchmarks show that our general technique for dynamic validation of the monotonicity property can cover a large class of codes, incurs minimal run-time overhead and obtains good speedups.


Toward Realistic Pursuit-Evasion Using a Roadmap-Based Approach, Samuel Rodriguez, Jory Denny, Juan Burgos, Aditya Mahadevan, Kasra Manavi, Luke Murray, Anton Kodochygov, Takis Zourntos, and Nancy M. Amato, IEEE International Conference on Robotics and Automation, Shanghai, China, May 2011. DOI: 10.1109/ICRA.2011.5980467
Keywords: Group Behavior, Multi-Agent, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{rodriguez2011toward,
title={Toward realistic pursuit-evasion using a roadmap-based approach},
author={Rodriguez, Samuel and Denny, Jory and Burgos, Juan and Mahadevan, Aditya and Manavi, Kasra and Murray, Luke and Kodochygov, Anton and Zourntos, Takis and Amato, Nancy M},
booktitle={2011 IEEE International Conference on Robotics and Automation},
pages={1738--1745},
year={2011},
organization={IEEE}
}


Abstract

In this work, we describe an approach for modeling and simulating group behaviors for pursuit-evasion that uses a graph-based representation of the environment and integrates multi-agent simulation with roadmap-based path planning. Our approach can be applied to more realistic scenarios than are typically studied in most previous work, including agents moving in 3D environments such as terrains, multi-story buildings, and dynamic environments. We also support more realistic three-dimensional visibility computations that allow evading agents to hide in crowds or behind hills. We demonstrate the utility of this approach on mobile robots and in simulation for a variety of scenarios including pursuit-evasion and tag on terrains, in multi-level buildings, and in crowds.


The STAPL Parallel Container Framework, Gabriel Tanase, Antal Buss, Adam Fidel, Harshvardhan, Ioannis Papadopoulos, Olga Pearce, Timmie Smith, Nathan Thomas, Xiabing Xu, Nedhal Mourad, Jeremy Vu, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 235-246, San Antonio, TX, USA, Feb 2011. DOI: 10.1145/1941553.1941586
Keywords: Parallel Containers, Parallel Programming, STAPL
Links : [Published]

BibTex

@article{10.1145/2038037.1941586,
author = {Tanase, Gabriel and Buss, Antal and Fidel, Adam and Harshvardhan and Papadopoulos, Ioannis and Pearce, Olga and Smith, Timmie and Thomas, Nathan and Xu, Xiabing and Mourad, Nedal and Vu, Jeremy and Bianco, Mauro and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {The STAPL Parallel Container Framework},
year = {2011},
issue_date = {August 2011},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {46},
number = {8},
issn = {0362-1340},
url = {https://doi.org/10.1145/2038037.1941586},
doi = {10.1145/2038037.1941586},
abstract = {The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work, we present the STAPL Parallel Container Framework (PCF), that is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers, without requiring the programmer to deal with concurrency or data distribution issues. The PCF provides a large number of basic parallel data structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The PCF provides a class hierarchy and a composition mechanism that allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate STAPL pContainer performance on a CRAY XT4 massively parallel system and show that pContainer methods, generic pAlgorithms, and different applications provide good scalability on more than 16,000 processors.},
journal = {SIGPLAN Not.},
month = feb,
pages = {235–246},
numpages = {12},
keywords = {languages, libraries, data, structures, containers, parallel}
}

@inproceedings{10.1145/1941553.1941586,
author = {Tanase, Gabriel and Buss, Antal and Fidel, Adam and Harshvardhan and Papadopoulos, Ioannis and Pearce, Olga and Smith, Timmie and Thomas, Nathan and Xu, Xiabing and Mourad, Nedal and Vu, Jeremy and Bianco, Mauro and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {The STAPL Parallel Container Framework},
year = {2011},
isbn = {9781450301190},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1941553.1941586},
doi = {10.1145/1941553.1941586},
abstract = {The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work, we present the STAPL Parallel Container Framework (PCF), that is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers, without requiring the programmer to deal with concurrency or data distribution issues. The PCF provides a large number of basic parallel data structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The PCF provides a class hierarchy and a composition mechanism that allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate STAPL pContainer performance on a CRAY XT4 massively parallel system and show that pContainer methods, generic pAlgorithms, and different applications provide good scalability on more than 16,000 processors.},
booktitle = {Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming},
pages = {235–246},
numpages = {12},
keywords = {containers, libraries, structures, languages, parallel, data},
location = {San Antonio, TX, USA},
series = {PPoPP \'11}
}


Abstract

The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work, we present the STAPL Parallel Container Framework (PCF), that is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers, without requiring the programmer to deal with concurrency or data distribution issues. The PCF provides a large number of basic parallel data structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The PCF provides a class hierarchy and a composition mechanism that allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate STAPL pContainer performance on a CRAY XT4 massively parallel system and show that pContainer methods, generic pAlgorithms, and different applications provide good scalability on more than 16,000 processors.


Utilizing Roadmaps in Evacuation Planning, Samuel Rodriguez, Nancy M. Amato, The International Journal of Virtual Reality, Vol: 10, Issue: 1, Jan 2011. DOI: https://doi.org/10.20870/IJVR.2011.10.1.2804
Keywords: Motion Planning
Links : [Published]

BibTex

@article{rodriguez2011utilizing,
title={Utilizing roadmaps in evacuation planning},
author={Rodriguez, Samuel and Amato, Nancy M},
journal={International Journal of Virtual Reality},
volume={10},
number={1},
pages={67--73},
year={2011}
}


Abstract

In this paper we describe utilization of roadmaps in a general evacuation planning system for complex 3D environments. The problem consists of heterogeneous groups of agents whose behaviors and properties affect usage of the environment when creating evacuation plans. This planning system includes behaviors for those agents evacuating and directors that may be guiding the agents to improve evacuation. One aspect we focus on is modeling different forms of direction and communication between agents.


Roadmap-Based Pursuit-Evasion in 3D Structures, Samuel Rodriguez, Jory Denny, Aditya Mahadevan, Jeremy Vu, Juan Burgos, Takis Zourntos, and Nancy M. Amato, International Conference on Computer Animation and Social Agents, Jan 2011.
Keywords: Motion Planning
Links : [Published]

BibTex

@inproceedings{rodriguez2011roadmap,
title={Roadmap-based pursuit-evasion in 3d structures},
author={Rodriguez, Samuel and Denny, Jory and Mahadevan, Aditya and Vu, Jeremy and Burgos, Juan and Zourntos, Takis and Amato, Nancy M},
booktitle={24th Intern. Conf. on Computer Animation and Social Agents (CASA)},
year={2011},
organization={Citeseer}
}


Abstract

We present an approach to the pursuit-evasion problem which is applicable to complex, multi-level environments. Studying each aspect of this problem in 3D structured environments is a distinct extension over many previous approaches. We also utilize our roadmap-based approach to multi-agent behavior when tracking agents of interest. Results are presented in three multi-level environments to highlight the search, pursuit and evasion components of the problem.


Speculative Parallelization of Loops, Lawrence Rauchwerger, Encyclopedia of Parallel Computing, pp. 1901-1912, Jan 2011. DOI: 10.1007/978-0-387-09766-4\_35
Keywords: Parallel Programming
Links : [Published]

BibTex

@incollection{DBLP:reference/parallel/Rauchwerger11,
author = {Lawrence Rauchwerger},
editor = {David A. Padua},
title = {Speculative Parallelization of Loops},
booktitle = {Encyclopedia of Parallel Computing},
pages = {1901--1912},
publisher = {Springer},
year = {2011},
url = {https://doi.org/10.1007/978-0-387-09766-4\_35},
doi = {10.1007/978-0-387-09766-4\_35},
timestamp = {Wed, 12 Jul 2017 09:11:52 +0200},
biburl = {https://dblp.org/rec/reference/parallel/Rauchwerger11.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

Speculative loop (thread) level parallelization is a compiler run-time technique that executes optimistically parallelized loops, verifies the correctness of their execution and, when necessary, backtracks to a safe state for possible re-execution. This technique includes a compiler (static) component for the transformation of the loop for speculative parallel execution as well as a run-time component which verifies correctness and re-executes when necessary.


Toward Simulating Realistic Pursuit-Evasion Using a Roadmap-Based Approach, amuel Rodriguez, Jory Denny, Takis Zourntos, Nancy M. Amato, International Conference on Motion in Games, pp. 82-93, Utrecht, Netherlands, Nov 2010. DOI: https://doi.org/10.1007/978-3-642-16958-8_9
Keywords: Motion Planning
Links : [Published]

BibTex

@inproceedings{rodriguez2010toward,
title={Toward simulating realistic pursuit-evasion using a roadmap-based approach},
author={Rodriguez, Samuel and Denny, Jory and Zourntos, Takis and Amato, Nancy M},
booktitle={International Conference on Motion in Games},
pages={82--93},
year={2010},
organization={Springer}
}


Abstract

In this work, we describe an approach for modeling and simulating group behaviors for pursuit-evasion that uses a graph-based representation of the environment and integrates multi-agent simulation with roadmap-based path planning. We demonstrate the utility of this approach for a variety of scenarios including pursuit-evasion on terrains, in multi-level buildings, and in crowds.


Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory, Roger Pearce, Maya Gokhale, Nancy M. Amato, Conference on High Performance Computing Networking, Storage and Analysis (SC), pp. 1-11, New Orleans, Louisiana, USA, Nov 2010. DOI: 10.1109/SC.2010.34
Keywords: Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{DBLP:conf/sc/PearceGA10,
author = {Roger A. Pearce and
Maya B. Gokhale and
Nancy M. Amato},
title = {Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External
Memory},
booktitle = {Conference on High Performance Computing Networking, Storage and Analysis,
{SC} 2010, New Orleans, LA, USA, November 13-19, 2010},
pages = {1--11},
publisher = {{IEEE}},
year = {2010},
url = {https://doi.org/10.1109/SC.2010.34},
doi = {10.1109/SC.2010.34},
timestamp = {Mon, 20 Apr 2020 17:14:43 +0200},
biburl = {https://dblp.org/rec/conf/sc/PearceGA10.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

Processing large graphs is becoming increasingly important for many domains such as social networks, bioinformatics, etc. Unfortunately, many algorithms and implementations do not scale with increasing graph sizes. As a result, researchers have attempted to meet the growing data demands using parallel and external memory techniques. We present a novel asynchronous approach to compute Breadth-First-Search (BFS), Single-Source-Shortest-Paths, and Connected Components for large graphs in shared memory. Our highly parallel asynchronous approach hides data latency due to both poor locality and delays in the underlying graph data storage. We present an experimental study applying our technique to both In-Memory and Semi-External Memory graphs utilizing multi-core processors and solid-state memory devices. Our experiments using synthetic and real-world datasets show that our asynchronous approach is able to overcome data latencies and provide significant speedup over alternative approaches. For example, on billion vertex graphs our asynchronous BFS scales up to 14 x on 16-cores.


The STAPL pView, Antal Buss, Adam Fidel, Harshvardhan, Timmie Smith, Gabriel Tanase, Nathan Thomas, Xiabing Xu, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC), pp. 261-275, Houston, Texas, USA, Oct 2010. DOI: 10.1007/978-3-642-19595-2_18
Keywords: Parallel Containers, Parallel Programming, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-642-19595-2_18,
author=\"Buss, Antal
and Fidel, Adam
and Harshvardhan
and Smith, Timmie
and Tanase, Gabriel
and Thomas, Nathan
and Xu, Xiabing
and Bianco, Mauro
and Amato, Nancy M.
and Rauchwerger, Lawrence\",
editor=\"Cooper, Keith
and Mellor-Crummey, John
and Sarkar, Vivek\",
title=\"The STAPL pView\",
booktitle=\"Languages and Compilers for Parallel Computing\",
year=\"2011\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"261--275\",
abstract=\"The Standard Template Adaptive Parallel Library (STAPL) is a C++ parallel programming library that provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. STAPL algorithms are written in terms of pViews, which provide a generic access interface to pContainer data by abstracting common data structure concepts. Briefly, pViews allow the same pContainer to present multiple interfaces, e.g., enabling the same pMatrix to be `viewed\' (or used) as a row-major or column-major matrix, or even as a vector. In this paper, we describe the staplpView concept and its properties. pViews generalize the iterator concept and enable parallelism by providing random access to, and an ADT for, collections of elements. We illustrate how pViews provide support for managing the tradeoff between expressivity and performance and examine the performance overhead incurred when using pViews.\",
isbn=\"978-3-642-19595-2\"
}


Abstract

The Standard Template Adaptive Parallel Library (STAPL) is a C++ parallel programming library that provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. STAPL algorithms are written in terms of pViews, which provide a generic access interface to pContainer data by abstracting common data structure concepts. Briefly, pViews allow the same pContainer to present multiple interfaces, e.g., enabling the same pMatrix to be ‘viewed’ (or used) as a row-major or column-major matrix, or even as a vector. In this paper, we describe the stapl pView concept and its properties. pViews generalize the iterator concept and enable parallelism by providing random access to, and an ADT for, collections of elements. We illustrate how pViews provide support for managing the tradeoff between expressivity and performance and examine the performance overhead incurred when using pViews.


Region Identification Methods for Efficient and Automated Motion Planning, Jory Denny, Anshul Agrawal, Evan Greco, Lydia Tapia, Nancy M. Amato, N/A, Sep 2010.
Keywords: Motion Planning
Links : [Published]

BibTex

@article{denny2010region,
title={Region Identification Methods for Efficient and Automated Motion Planning},
author={Denny, Jory and Agrawal, Anshul and Greco, Evan and Tapia, Lydia and Amato, Nancy M},
year={2010},
publisher={Citeseer}
}


Abstract

Realistic environments for motion planning problems typically consist of multiple areas, e.g., in a house there are rooms, hallways, and doorways. Decomposition of the environment into smaller areas, or regions, has been shown to impact the quality of planning for adaptive methods. Also other complex problems, such as planning for groups of agents with environmental preferences, can be aided with automated region identification. However, despite the advantages of regions, automatic region identification is not always easy, inexpensive, or consistent. In this paper we explore and compare three methods of automatic region identification and quantify the effects of the resulting regions on planning. The three strategies used for region identification are the statistical methods: k-means clustering, PG-means clustering, and Hierarchical clustering. First, we explore the differences in the regions produced by these three methods in a variety of rigid body and articulated linkage environments. Then, we quantitatively compare the usefulness of the region methods in automated motion planning through the application of the regions within the Unsupervised Adaptive Strategy for Motion Planning (UAS). Our results indicate both regions that seem intuitive and non-intuitive to the problems solve motion planning problems with increased efficiency and automation. That is, we found that planning generally benefitted from using regions, and the benefit was not tied to the naturalness of the region.


Reachable distance space: Efficient sampling-based planning for spatially constrained systems, Xinyu Tang, Shawna Thomas, Phillip Coleman, Nancy M. Amato, The International Journal of Robotics Research, Vol: 29, Issue: 7, pp. 916-934, Jun 2010. DOI: 10.1177/0278364909357643
Keywords: Motion Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{tang2010reachable,
title={Reachable distance space: Efficient sampling-based planning for spatially constrained systems},
author={Tang, Xinyu and Thomas, Shawna and Coleman, Phillip and Amato, Nancy M},
journal={The international journal of robotics research},
volume={29},
number={7},
pages={916--934},
year={2010},
publisher={SAGE Publications Sage UK: London, England}
}


Abstract

Motion planning for spatially constrained robots is difficult due to additional constraints placed on the robot, such as closure constraints for closed chains or requirements on end-effector placement for articulated linkages. It is usually computationally too expensive to apply sampling-based planners to these problems since it is difficult to generate valid configurations. We overcome this challenge by redefining the robot’s degrees of freedom and constraints into a new set of parameters, called reachable distance space (RD-space), in which all configurations lie in the set of constraint-satisfying subspaces. This enables us to directly sample the constrained subspaces with complexity linear in the number of the robot’s degrees of freedom. In addition to supporting efficient sampling of configurations, we show that the RD-space formulation naturally supports planning and, in particular, we design a local planner suitable for use by sampling-based planners. We demonstrate the effectiveness and efficiency of our approach for several systems including closed chain planning with multiple loops, restricted end-effector sampling, and on-line planning for drawing/sculpting. We can sample single-loop closed chain systems with 1,000 links in time comparable to open chain sampling, and we can generate samples for 1,000-link multi-loop systems of varying topologies in less than a second.


STAPL: Standard Template Adaptive Parallel Library, Antal Buss, Harshvardhan, Ioannis Papadopoulos, Olga Tkachyshyn, Timmie Smith, Gabriel Tanase, Nathan Thomas, Xiabing Xu, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the 3rd Annual Haifa Experimental Systems Conference, Haifa, Israel, May 2010. DOI: 10.1145/1815695.1815713
Keywords: High-Performance Computing, Parallel Programming, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/1815695.1815713,
author = {Buss, Antal and Harshvardhan and Papadopoulos, Ioannis and Pearce, Olga and Smith, Timmie and Tanase, Gabriel and Thomas, Nathan and Xu, Xiabing and Bianco, Mauro and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {STAPL: Standard Template Adaptive Parallel Library},
year = {2010},
isbn = {9781605589084},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1815695.1815713},
doi = {10.1145/1815695.1815713},
abstract = {The Standard Template Adaptive Parallel Library (stapl) is a high-productivity parallel programming framework that extends C++ and stl with unified support for shared and distributed memory parallelism. stapl provides distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. The stapl runtime system provides the abstraction for communication and program execution. In this paper, we describe the major components of stapl and present performance results for both algorithms and data structures showing scalability up to tens of thousands of processors.},
booktitle = {Proceedings of the 3rd Annual Haifa Experimental Systems Conference},
articleno = {14},
numpages = {10},
keywords = {high productivity parallel programming, parallel data structures, library},
location = {Haifa, Israel},
series = {SYSTOR \'10}
}


Abstract

The Standard Template Adaptive Parallel Library (stapl) is a high-productivity parallel programming framework that extends C++ and stl with unified support for shared and distributed memory parallelism. stapl provides distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. The stapl runtime system provides the abstraction for communication and program execution. In this paper, we describe the major components of stapl and present performance results for both algorithms and data structures showing scalability up to tens of thousands of processors.


Behavior-Based Evacuation Planning, Samuel Rodriguez, Nancy M. Amato, IEEE International Conference on Robotics and Automation, May 2010. DOI: 10.1109/ROBOT.2010.5509502
Keywords: Group Behavior, Multi-Agent, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{rodriguez2010behavior,
title={Behavior-based evacuation planning},
author={Rodriguez, Samuel and Amato, Nancy M},
booktitle={2010 IEEE International Conference on Robotics and Automation},
pages={350--355},
year={2010},
organization={IEEE}
}


Abstract

In this work, we present a formulation of an evacuation planning problem that is inspired by motion planning and describe an integrated behavioral agent-based and roadmap-based motion planning approach to solve it. Our formulation allows users to test the effect on evacuation of a number of different environmental factors. One of our main focuses is to provide a mechanism to investigate how the interaction between agents influences the resulting evacuation plans. Specifically, we explore how various types of control provided by a set of directing agents effects the overall evacuation planning strategies of the evacuating agents.


A Motion Planning Approach to Studying Molecular Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Communications in Information and Systems, Vol: 10, Issue: 1, pp. 53-68, Jan 2010. DOI: 10.4310/cis.2010.v10.n1.a4
Keywords: Computational Biology, Motion Planning, Protein Folding
Links : [Published]

BibTex

@article{tapia-cis-2010-j
, author = "Lydia Tapia and Shawna Thomas and Nancy M. Amato"
, title = "A Motion Planning Approach to Studying Molecular Motions"
, journal = "Communications in Information and Systems"
, volume = "10"
, issue = "1"
, year = "2010"
, pages = "52-68"
, doi = "10.4310/CIS.2010.v10.n1.a4"
, note = "Special Issue in honor of Michael Waterman. Open Access. https://projecteuclid.org/euclid.cis/1268143373"
}


Abstract

While structurally very different, protein and RNA molecules share an important attribute. The motions they undergo are strongly related to the function they perform. For example, many diseases such as Mad Cow disease or Alzheimer’s disease are associated with protein misfolding and aggregation. Similarly, RNA folding velocity may regulate the plasmid copy number, and RNA folding kinetics can regulate gene expression at the translational level. Knowledge of the stability, folding, kinetics and detailed mechanics of the folding process may help provide insight into how proteins and RNAs fold. In this paper, we present an overview of our work with a computational method we have adapted from robotic motion planning to study molecular motions. We have validated against experimental data and have demonstrated that our method can capture biological results such as stochastic folding pathways, population kinetics of various conformations, and relative folding rates. Thus, our method provides both a detailed view (e.g., individual pathways) and a global view (e.g., population kinetics, relative folding rates, and reaction coordinates) of energy landscapes of both proteins and RNAs. We have validated these techniques by showing that we observe the same relative folding rates as shown in experiments for structurally similar protein molecules that exhibit different folding behaviors. Our analysis has also been able to predict the same relative gene expression rate for wild-type MS2 phage RNA and three of its mutants.


The STAPL pList, Gabriel Tanase, Xiabing Xu, Antal Buss, Harshvardhan, Ioannis Papadopoulos, Olga Tkachyshyn, Timmie Smith, Nathan Thomas, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing, pp. 16-30, Newark, DE, USA, Oct 2009. DOI: 10.1007/978-3-642-13374-9_2
Keywords: Parallel Containers, Parallel Programming, STAPL
Links : [Published]

BibTex

@inproceedings{10.1007/978-3-642-13374-9_2,
author = {Tanase, Gabriel and Xu, Xiabing and Buss, Antal and Harshvardhan and Papadopoulos, Ioannis and Pearce, Olga and Smith, Timmie and Thomas, Nathan and Bianco, Mauro and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {The STAPL Plist},
year = {2009},
isbn = {3642133738},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-642-13374-9_2},
doi = {10.1007/978-3-642-13374-9_2},
abstract = {We present the design and implementation of the staplpList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a parallel program. The Standard Template Adaptive Parallel Library (stapl) is a parallel programming library that extends C++ with support for parallelism. stapl provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. staplpContainers are thread-safe, concurrent objects, providing appropriate interfaces (e.g., views) that can be used by generic pAlgorithms. The pList provides stl equivalent methods, such as insert, erase, and splice, additional methods such as split, and efficient asynchronous (non-blocking) variants of some methods for improved parallel performance. We evaluate the performance of the staplpList on an IBM Power 5 cluster and on a CRAY XT4 massively parallel processing system. Although lists are generally not considered good data structures for parallel processing, we show that pList methods and pAlgorithms (p_generate and p_partial_sum) operating on pLists provide good scalability on more than 103 processors and that pList compares favorably with other dynamic data structures such as the pVector.},
booktitle = {Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing},
pages = {16–30},
numpages = {15},
location = {Newark, DE},
series = {LCPC\'09}
}


Abstract

We present the design and implementation of the staplpList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a parallel program. The Standard Template Adaptive Parallel Library (stapl) is a parallel programming library that extends C++ with support for parallelism. stapl provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are thread-safe, concurrent objects, providing appropriate interfaces (e.g., views) that can be used by generic pAlgorithms. The pList provides stl equivalent methods, such as insert, erase, and splice, additional methods such as split, and efficient asynchronous (non-blocking) variants of some methods for improved parallel performance. We evaluate the performance of the stapl pList on an IBM Power 5 cluster and on a CRAY XT4 massively parallel processing system. Although lists are generally not considered good data structures for parallel processing, we show that pList methods and pAlgorithms (p_generate and p_partial_sum) operating on pLists provide good scalability on more than 1000 processors and that pList compares favorably with other dynamic data structures such as the pVector.


Two Memory Allocators that Use Hints to Improve Locality, Alin Jula, Lawrence Rauchwerger, Proceedings of the 2009 International Symposium on Memory Management (ISMM), pp. 109-118, Dublin, Ireland, Jun 2009. DOI: 10.1145/1542431.1542447
Keywords: Memory Management, Operating Systems, Optimizing Compilers
Links : [Published]

BibTex

@inproceedings{10.1145/1542431.1542447,
author = {Jula, Alin and Rauchwerger, Lawrence},
title = {Two Memory Allocators That Use Hints to Improve Locality},
year = {2009},
isbn = {9781605583471},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1542431.1542447},
doi = {10.1145/1542431.1542447},
abstract = {Dynamic memory allocators affect an application\'s performance through their data layout quality. They can use an application\'s allocation hints to improve the spatial locality of this layout. However, a practical approach needs to be automatic, without user intervention. In this paper we present two locality improving allocators, that use allocation hints provided automatically from the C++ STL library to improve an application\'s spatial locality. When compared to state-of-the-art allocators on seven real world applications, our allocators run on average 7% faster than the Lea allocator, and 17% faster than the FreeBSD\'s allocator, with the same memory fragmentation as the Lea allocator, one of the best allocators.While considering locality as an important goal, locality improving allocators must not abandon the existing constraints of fast allocation speed and low fragmentation. These constraints further challenge their design and implementation. We experimentally show that within a memory allocator, allocation speed, memory fragmentation, and spatial locality compete with each other in a game of rock, paper, scissors: when one tries to improve one trait, the others suffer. We conclude that our allocators achieve a good balance of these traits, and they can easily be adjusted to optimize the most important trait for each application.},
booktitle = {Proceedings of the 2009 International Symposium on Memory Management},
pages = {109–118},
numpages = {10},
location = {Dublin, Ireland},
series = {ISMM \'09}
}


Abstract

Dynamic memory allocators affect an application\'s performance through their data layout quality. They can use an application\'s allocation hints to improve the spatial locality of this layout. However, a practical approach needs to be automatic, without user intervention. In this paper we present two locality improving allocators, that use allocation hints provided automatically from the C++ STL library to improve an application\'s spatial locality. When compared to state-of-the-art allocators on seven real world applications, our allocators run on average 7% faster than the Lea allocator, and 17% faster than the FreeBSD\'s allocator, with the same memory fragmentation as the Lea allocator, one of the best allocators. While considering locality as an important goal, locality improving allocators must not abandon the existing constraints of fast allocation speed and low fragmentation. These constraints further challenge their design and implementation. We experimentally show that within a memory allocator, allocation speed, memory fragmentation, and spatial locality compete with each other in a game of rock, paper, scissors: when one tries to improve one trait, the others suffer. We conclude that our allocators achieve a good balance of these traits, and they can easily be adjusted to optimize the most important trait for each application.


Design for Interoperability in STAPL: pMatrices and Linear Algebra Algorithms, Antal Buss, Timmie Smith, Gabriel Tanase, Nathan Thomas, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC 2008). Lecture Notes in Computer Science, vol 5335. Springer, pp. 304-315, Edmonton, Alberta, Canada, Jul 2008. DOI: 10.1007/978-3-540-89740-8_21
Keywords: Parallel Containers, Parallel Programming, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-540-89740-8_21,
author=\"Buss, Antal A.
and Smith, Timmie G.
and Tanase, Gabriel
and Thomas, Nathan L.
and Bianco, Mauro
and Amato, Nancy M.
and Rauchwerger, Lawrence\",
editor=\"Amaral, Jos{\\\'e} Nelson\",
title=\"Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms\",
booktitle=\"Languages and Compilers for Parallel Computing\",
year=\"2008\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"304--315\",
abstract=\"The Standard Template Adaptive Parallel Library (stapl) is a high-productivity parallel programming framework that extends C++ and stl with unified support for shared and distributed memory parallelism. stapl provides distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. To improve productivity and performance, it is essential for stapl to exploit third party libraries, including those developed in programming languages other than C++. In this paper we describe a methodology that enables third party libraries to be used with stapl. This methodology allows a developer to specify when these specialized libraries can correctly be used, and provides mechanisms to transparently invoke them when appropriate. It also provides support for using staplpAlgorithms and pContainers in external codes. As a concrete example, we illustrate how third party libraries, namely BLAS and PBLAS, can be transparently embedded into stapl to provide efficient linear algebra algorithms for the staplpMatrix, with negligible slowdown with respect to the optimized libraries themselves.\",
isbn=\"978-3-540-89740-8\"
}


Abstract

The Standard Template Adaptive Parallel Library (stapl) is a high-productivity parallel programming framework that extends C++ and stl with unified support for shared and distributed memory parallelism. stapl provides distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. To improve productivity and performance, it is essential for stapl to exploit third party libraries, including those developed in programming languages other than C++. In this paper we describe a methodology that enables third party libraries to be used with stapl. This methodology allows a developer to specify when these specialized libraries can correctly be used, and provides mechanisms to transparently invoke them when appropriate. It also provides support for using stapl pAlgorithms and pContainers in external codes. As a concrete example, we illustrate how third party libraries, namely BLAS and PBLAS, can be transparently embedded into stapl to provide efficient linear algebra algorithms for the stapl pMatrix, with negligible slowdown with respect to the optimized libraries themselves.


RESAMPL: A Region-Sensitive Adaptive Motion Planner, Samuel Rodriguez, Shawna Thomas, Roger Pearce, Nancy M. Amato, Algorithmic Foundation of Robotics VII , pp. 285-300, N/A, Jan 2008. DOI: 10.1007/978-3-540-68405-3_18
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@Inbook{Rodriguez2008,
author="Rodriguez, Samuel
and Thomas, Shawna
and Pearce, Roger
and Amato, Nancy M.",
editor="Akella, Srinivas
and Amato, Nancy M.
and Huang, Wesley H.
and Mishra, Bud",
title="RESAMPL: A Region-Sensitive Adaptive Motion Planner",
bookTitle="Algorithmic Foundation of Robotics VII: Selected Contributions of the Seventh International Workshop on the Algorithmic Foundations of Robotics",
year="2008",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="285--300",
isbn="978-3-540-68405-3",
doi="10.1007/978-3-540-68405-3_18",
url="https://doi.org/10.1007/978-3-540-68405-3_18"
}


Abstract

Automatic motion planning has applications ranging from traditional robotics to computer-aided design to computational biology and chemistry. While randomized planners, such as probabilistic roadmap methods (PRMs) or rapidly-exploring random trees (RRT), have been highly successful in solving many high degree of freedom problems, there are still many scenarios in which we need better methods, e.g., problems involving narrow passages or which contain multiple regions that are best suited to different planners. In this work, we present RESAMPL, a motion planning strategy that uses local region information to make intelligent decisions about how and where to sample, which samples to connect together, and to find paths through the environment. Briefly, RESAMPL classifies regions based on the entropy of the samples in it, and then uses these classifications to further refine the sampling. Regions are placed in a region graph that encodes relationships between regions, e.g., edges correspond to overlapping regions. The strategy for connecting samples is guided by the region graph, and can be exploited in both multi-query and single-query scenarios. Our experimental results comparing RESAMPL to previous multi-query and single-query methods show that RESAMPL is generally significantly faster and also usually requires fewer samples to solve the problem.


Incremental Map Generation (IMG), Dawen Xie, Marco Morales, Roger Pearce, Shawna Thomas, Jyh-Ming Lien, Nancy M. Amato, Algorithmic Foundation of Robotics VII, N/A, Jan 2008. DOI: 10.1007/978-3-540-68405-3_4
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@Inbook{Xie2008,
author="Xie, Dawen
and Morales, Marco
and Pearce, Roger
and Thomas, Shawna
and Lien, Jyh-Ming
and Amato, Nancy M.",
editor="Akella, Srinivas
and Amato, Nancy M.
and Huang, Wesley H.
and Mishra, Bud",
title="Incremental Map Generation (IMG)",
bookTitle="Algorithmic Foundation of Robotics VII: Selected Contributions of the Seventh International Workshop on the Algorithmic Foundations of Robotics",
year="2008",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="53--68",
isbn="978-3-540-68405-3",
doi="10.1007/978-3-540-68405-3_4",
url="https://doi.org/10.1007/978-3-540-68405-3_4"
}


Abstract

Probabilistic roadmap methods (prms) have been highly successful in solving many high degree of freedom motion planning problems arising in diverse application domains such as traditional robotics, computer-aided design, and computational biology and chemistry. One important practical issue with prms is that they do not provide an automated mechanism to determine how large a roadmap is needed for a given problem. Instead, users typically determine this by trial and error and as a consequence often construct larger roadmaps than are needed. In this paper, we propose a new prm-based framework called Incremental Map Generation (img) to address this problem. Our strategy is to break the map generation into several processes, each of which generates samples and connections, and to continue adding the next increment of samples and connections to the evolving roadmap until it stops improving. In particular, the process continues until a set of evaluation criteria determine that the planning strategy is no longer effective at improving the roadmap. We propose some general evaluation criteria and show how to apply them to construct different types of roadmaps, e.g., roadmaps that coarsely or more finely map the space. In addition, we show how img can be integrated with previously proposed adaptive strategies for selecting sampling methods. We provide results illustrating the power of img.


Incremental Map Generation (IMG), Dawen Xie, Marco Morales, Roger Pearce, Shawna Thomas, Jyh-Ming Lien, Nancy M. Amato, Algorithmic Foundation of Robotics VII, N/A, Jan 2008. DOI: 10.1007/978-3-540-68405-3_4
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@Inbook{Xie2008,
author="Xie, Dawen
and Morales, Marco
and Pearce, Roger
and Thomas, Shawna
and Lien, Jyh-Ming
and Amato, Nancy M.",
editor="Akella, Srinivas
and Amato, Nancy M.
and Huang, Wesley H.
and Mishra, Bud",
title="Incremental Map Generation (IMG)",
bookTitle="Algorithmic Foundation of Robotics VII: Selected Contributions of the Seventh International Workshop on the Algorithmic Foundations of Robotics",
year="2008",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="53--68",
isbn="978-3-540-68405-3",
doi="10.1007/978-3-540-68405-3_4",
url="https://doi.org/10.1007/978-3-540-68405-3_4"
}


Abstract

Probabilistic roadmap methods (prms) have been highly successful in solving many high degree of freedom motion planning problems arising in diverse application domains such as traditional robotics, computer-aided design, and computational biology and chemistry. One important practical issue with prms is that they do not provide an automated mechanism to determine how large a roadmap is needed for a given problem. Instead, users typically determine this by trial and error and as a consequence often construct larger roadmaps than are needed. In this paper, we propose a new prm-based framework called Incremental Map Generation (img) to address this problem. Our strategy is to break the map generation into several processes, each of which generates samples and connections, and to continue adding the next increment of samples and connections to the evolving roadmap until it stops improving. In particular, the process continues until a set of evaluation criteria determine that the planning strategy is no longer effective at improving the roadmap. We propose some general evaluation criteria and show how to apply them to construct different types of roadmaps, e.g., roadmaps that coarsely or more finely map the space. In addition, we show how img can be integrated with previously proposed adaptive strategies for selecting sampling methods. We provide results illustrating the power of img.


Associative Parallel Containers In STAPL, Gabriel Tanase, Chidambareswaran (Chids) Raman, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC), pp. 156-171, Urbana, IL, USA, Oct 2007. DOI: 10.1007/978-3-540-85261-2_11
Keywords: Parallel Containers, Parallel Programming, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-540-85261-2_11,
author=\"Tanase, Gabriel
and Raman, Chidambareswaran
and Bianco, Mauro
and Amato, Nancy M.
and Rauchwerger, Lawrence\",
editor=\"Adve, Vikram
and Garzar{\\\'a}n, Mar{\\\'i}a Jes{\\\'u}s
and Petersen, Paul\",
title=\"Associative Parallel Containers in STAPL\",
booktitle=\"Languages and Compilers for Parallel Computing\",
year=\"2008\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"156--171\",
abstract=\"The Standard Template Adaptive Parallel Library (stapl) is a parallel programming framework that extends C++ and stl with support for parallelism. stapl provides a collection of parallel data structures (pContainers) and algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. staplpContainers are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They also provide appropriate interfaces that can be used by generic pAlgorithms. In this work, we present the design and implementation of the stapl associative pContainers: pMap, pSet, pMultiMap, pMultiSet, pHashMap, and pHashSet. These containers provide optimal insert, search, and delete operations for a distributed collection of elements based on keys. Their methods include counterparts of the methods provided by the stl associative containers, and also some asynchronous (non-blocking) variants that can provide improved performance in parallel. We evaluate the performance of the stapl associative pContainers on an IBM Power5 cluster, an IBM Power3 cluster, and on a linux-based Opteron cluster, and show that the new pContainer asynchronous methods, generic pAlgorithms (e.g., pfind) and a sort application based on associative pContainers, all provide good scalability on more than 1000 processors.\",
isbn=\"978-3-540-85261-2\"
}


Abstract

The Standard Template Adaptive Parallel Library (stapl) is a parallel programming framework that extends C++ and stl with support for parallelism. stapl provides a collection of parallel data structures (pContainers) and algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They also provide appropriate interfaces that can be used by generic pAlgorithms. In this work, we present the design and implementation of the stapl associative pContainers: pMap, pSet, pMultiMap, pMultiSet, pHashMap, and pHashSet. These containers provide optimal insert, search, and delete operations for a distributed collection of elements based on keys. Their methods include counterparts of the methods provided by the stl associative containers, and also some asynchronous (non-blocking) variants that can provide improved performance in parallel. We evaluate the performance of the stapl associative pContainers on an IBM Power5 cluster, an IBM Power3 cluster, and on a linux-based Opteron cluster, and show that the new pContainer asynchronous methods, generic pAlgorithms (e.g., pfind) and a sort application based on associative pContainers, all provide good scalability on more than 1000 processors.


The STAPL pArray, Gabriel Tanase, Mauro Bianco, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the 2007 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA), pp. 73-80, Brasov, Romania, Sep 2007. DOI: 10.1145/1327171.1327180
Keywords: Parallel Containers, Parallel Programming, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/1327171.1327180,
author = {Tanase, Gabriel and Bianco, Mauro and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {The STAPL PArray},
year = {2007},
isbn = {9781595938077},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1327171.1327180},
doi = {10.1145/1327171.1327180},
abstract = {The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming framework that extends C++ and STL with support for parallelism. STAPL provides parallel data structures (pContainers) and generic parallel algorithms (pAlgorithms), and a methodology for extending them to provide customized functionality. STAPL pContainers are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They provide views as a generic means to access data that can be passed as input to generic pAlgorithms.In this work, we present the STAPL pArray, the parallel equivalent of the sequential STL valarray, a fixed-size data structure optimized for storing and accessing data based on one-dimensional indices. We describe the pArray design and show how it can support a variety of underlying data distribution policies currently available in STAPL, such as blocked or blocked cyclic. We provide experimental results showing that pAlgorithms using the pArray scale well to more than 2,000 processors. We also provide results using different data distributions that illustrate that the performance of pAlgorithms and pArray methods is usually sensitive to the underlying data distribution, and moreover, that there is no one data distribution that performs best for all pAlgorithms, processor counts, or machines.},
booktitle = {Proceedings of the 2007 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture},
pages = {73–80},
numpages = {8},
location = {Brasov, Romania},
series = {MEDEA \'07}
}


Abstract

The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming framework that extends C++ and STL with support for parallelism. STAPL provides parallel data structures (pContainers) and generic parallel algorithms (pAlgorithms), and a methodology for extending them to provide customized functionality. STAPL pContainers are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They provide views as a generic means to access data that can be passed as input to generic pAlgorithms. In this work, we present the STAPL pArray, the parallel equivalent of the sequential STL valarray, a fixed-size data structure optimized for storing and accessing data based on one-dimensional indices. We describe the pArray design and show how it can support a variety of underlying data distribution policies currently available in STAPL, such as blocked or blocked cyclic. We provide experimental results showing that pAlgorithms using the pArray scale well to more than 2,000 processors. We also provide results using different data distributions that illustrate that the performance of pAlgorithms and pArray methods is usually sensitive to the underlying data distribution, and moreover, that there is no one data distribution that performs best for all pAlgorithms, processor counts, or machines.


Kinetics Analysis Methods for Approximate Folding Landscapes , Lydia Tapia, Xinyu Tang, Shawna Thomas, Nancy M. Amato, Bioinformatics, Vol: Volume 23, Issue: Issue 13, pp. i539- i548, Jul 2007. DOI: 10.1093/bioinformatics/btm199
Keywords: Protein Folding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{tapia2007kinetics,
title={Kinetics analysis methods for approximate folding landscapes},
author={Tapia, Lydia and Tang, Xinyu and Thomas, Shawna and Amato, Nancy M},
journal={Bioinformatics},
volume={23},
number={13},
pages={i539--i548},
year={2007},
publisher={Oxford University Press}
}


Abstract

Motivation: Protein motions play an essential role in many biochemical processes. Lab studies often quantify these motions in terms of their kinetics such as the speed at which a protein folds or the population of certain interesting states like the native state. Kinetic metrics give quantifiable measurements of the folding process that can be compared across a group of proteins such as a wild-type protein and its mutants. Results: We present two new techniques, map-based master equation solution and map-based Monte Carlo simulation, to study protein kinetics through folding rates and population kinetics from approximate folding landscapes, models called maps. From these two new techniques, interesting metrics that describe the folding process, such as reaction coordinates, can also be studied. In this article we focus on two metrics, formation of helices and structure formation around tryptophan residues. These two metrics are often studied in the lab through circular dichroism (CD) spectra analysis and tryptophan fluorescence experiments, respectively. The approximated landscape models we use here are the maps of protein conformations and their associated transitions that we have presented and validated previously. In contrast to other methods such as the traditional master equation and Monte Carlo simulation, our techniques are both fast and can easily be computed for full-length detailed protein models. We validate our map-based kinetics techniques by comparing folding rates to known experimental results. We also look in depth at the population kinetics, helix formation and structure near tryptophan residues for a variety of proteins.


Sensitivity Analysis for Automatic Parallelization on Multi-Cores, Silvius Rus, Maikel Pennings, Lawrence Rauchwerger, Proceedings of the 21st Annual International Conference on Supercomputing (ICS), pp. 263-273, Seattle, Washington, USA, Jun 2007. DOI: 10.1145/1274971.1275008
Keywords: Optimizing Compilers, Parallelising Compilers
Links : [Published]

BibTex

@inproceedings{10.1145/1274971.1275008,
author = {Rus, Silvius and Pennings, Maikel and Rauchwerger, Lawrence},
title = {Sensitivity Analysis for Automatic Parallelization on Multi-Cores},
year = {2007},
isbn = {9781595937681},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1274971.1275008},
doi = {10.1145/1274971.1275008},
abstract = {Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution.We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.},
booktitle = {Proceedings of the 21st Annual International Conference on Supercomputing},
pages = {263–273},
numpages = {11},
keywords = {parallelism, sensitivity analysis, multicore},
location = {Seattle, Washington},
series = {ICS '07}
}


Abstract

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution. We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.


Biasing Samplers to Improve Motion Planning Performance, Shawna Thomas, Marco Morales, Xinyu Tang, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Roma, Italy, Apr 2007. DOI: 10.1109/ROBOT.2007.363556
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{4209320,

author={S. {Thomas} and M. {Morales} and X. {Tang} and N. M. {Amato}},

booktitle={Proceedings 2007 IEEE International Conference on Robotics and Automation},

title={Biasing Samplers to Improve Motion Planning Performance},

year={2007},

volume={},

number={},

pages={1625-1630},

doi={10.1109/ROBOT.2007.363556}}


Abstract

With the success of randomized sampling-based motion planners such as probabilistic roadmap methods, much work has been done to design new sampling techniques and distributions. To date, there is no sampling technique that outperforms all other techniques for all motion planning problems. Instead, each proposed technique has different strengths and weaknesses. However, little work has been done to combine these techniques to create new distributions. In this paper, we propose to bias one sampling distribution with another such that the resulting distribution out-performs either of its parent distributions. We present a general framework for biasing samplers that is easily extendable to new distributions and can handle an arbitrary number of parent distributions by chaining them together. Our experimental results show that by combining distributions, we can out-perform existing planners. Our results also indicate that not one single distribution combination performs the best in all problems, and we identify which perform better for the specific application domains studied.


Planning with Reachable Distances: Fast Enforcement of Closure Constraints, Xinyu Tang, Shawna Thomas, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Roma, Italy, Apr 2007. DOI: 10.1109/ROBOT.2007.363872
Keywords: Manipulation Planning, Reachability Analysis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{4209490, author={X. {Tang} and S. {Thomas} and N. M. {Amato}}, booktitle={Proceedings 2007 IEEE International Conference on Robotics and Automation}, title={Planning with Reachable Distances: Fast Enforcement of Closure Constraints}, year={2007}, volume={}, number={}, pages={2694-2699}, doi={10.1109/ROBOT.2007.363872}}


Abstract

Motion planning for closed-chain systems is particularly difficult due to additional closure constraints placed on the system. In fact, the probability of randomly selecting a set of joint angles that satisfy the closure constraints is zero. We propose planning with reachable distance (PRD) to overcome this challenge by first precomputing the subspace satisfying the closure constraints, then directly sampling in it. To do so, we represent the chain as a hierarchy of sub-chains. Then we calculate the "closure" sub-space as appropriate reachable distance ranges of sub-chains satisfying the closure constraints. This provides two distinct advantages over traditional approaches: (1) configurations are quickly sampled and converted to joint angles using basic trigonometry functions instead of more expensive inverse kinematics solvers, and (2) configurations are guaranteed to be closed. In this paper, we describe this hierarchical chain representation and give a sampling algorithm with complexity linear in the number of links. We provide the necessary motion planning primitives for most sampling-based motion planners. Our experimental results show our method is fast, making sampling closed configurations comparable to sampling open chain configurations that ignore closure constraints. Our method is general, easy to implement, and also extends to other distance-related constraints besides the ones demonstrated here


Tools for Simulating and Analyzing RNA Folding Kinetics, Xinyu Tang, Shawna Thomas, Lydia Tapia, Nancy M. Amato, In. Proc. International Conference on Research in Computational Molecular Biology, Oaklanda, California, USA, Apr 2007. DOI: 10.1007/978-3-540-71681-5_19
Keywords: Computational Biology, RNA, Sampling-Based Motion Planning
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-540-71681-5_19,
author="Tang, Xinyu
and Thomas, Shawna
and Tapia, Lydia
and Amato, Nancy M.",
editor="Speed, Terry
and Huang, Haiyan",
title="Tools for Simulating and Analyzing RNA Folding Kinetics",
booktitle="Research in Computational Molecular Biology",
year="2007",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="268--282",
isbn="978-3-540-71681-5"
}


Abstract

It has recently been found that some RNA functions are determined by the actual folding kinetics and not just the RNA’s nucleotide sequence or its native structure. We present new computational tools for simulating and analyzing RNA folding kinetic metrics such as population kinetics, folding rates, and the folding of particular subsequences. Our method first builds an approximate representation (called a map) of the RNA’s folding energy landscape, and then uses specialized analysis techniques to extract folding kinetics from the map. We provide a new sampling strategy called Probabilistic Boltzmann Sampling (PBS) that enables us to approximate the folding landscape with much smaller maps, typically by several orders of magnitude. We also describe a new analysis technique, Map-based Monte Carlo (MMC) simulation, to stochastically extract folding pathways from the map. We demonstrate that our technique can be applied to large RNA (e.g., 200+ nucleotides), where representing the full landscape is infeasible, and that our tools provide results comparable to other simulation methods that work on complete energy landscapes. We present results showing that our approach computes the same relative functional rates as seen in experiments for the relative plasmid replication rates of ColE1 RNAII and its mutants, and for the relative gene expression rates of MS2 phage RNA and its mutants.


Sensitivity Analysis for Migrating Programs to Multi-Cores, Silvius Rus, Marinus Pennings, Lawrence Rauchwerger, Technical Report, TR06-015, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, Dec 2006.
Keywords: Data Dependence, Optimizing Compilers, Parallelising Compilers
Links : [Manuscript]

BibTex

@techreport{srus-tr06-015-2006,
title = "Sensitivity Analysis for Migrating Programs to Multi-Cores",
author = "Rus, Silvius Rus and Pennings, Marinus and Rauchwerger, Lawrence",
institution = "Parasol Laboratory, Department of Computer Science, Texas A&M University",
address = "College Station, TX, USA",
number = "TR06-015",
year = 2006,
month = dec
}


Abstract

Recently, a technique known as Hybrid Analysis has demonstrated the possibility of automatic coarse grain program parallelization through an integrated combination of static and dynamic analysis techniques. The recent introduction of multi-cores for the mass market has on the one hand exacerbated the need for such a technology and, on the other hand, changed the cost/benefit ratio of paralleliza- tion. Multi-Cores have low communication costs but the degree of parallelism is relatively small (¡100 processors for the next 5 years). Therefore hybrid parallelization techniques (static/dynamic analysis) need to keep their dynamic overhead very low in order to benefit multi-core systems. The Sensitivity Analysis (SA), presented in this paper is a novel technique that can reduce the dynamic overhead of previous Hybrid Analysis technology. Instead of dynamically evaluating the ag- gregated memory reference sets representing the potential data dependences, SA can frequently extract light weight, sufficient conditions for which these dependence sets are empty. The cost of the run-time evaluation of these conditions, named predicates, is low enough to make parallelization profitable on multi-cores. We have implemented the SA technology in the Polaris compiler and evaluated its performance on 22 benchmark SPEC and PERFECT codes. We have obtained speedups superior to the Intel Ifort compiler in most cases because we could extract most of the available coarse grained parallelism.


Approximate Convex Decomposition and its Applications, Jyh-Ming Lien, Doctoral dissertation, Texas A & M University, College Station, Texas, USA, Dec 2006. DOI: N/A
Keywords: Computational Geometry, Convex Decomposition
Links : [Published]

BibTex

@phdthesis{lien2006approximate,
title={Approximate convex decomposition and its applications},
author={Lien, Jyh-Ming},
volume={69},
number={01},
year={2006}
}


Abstract

Geometric computations are essential in many real-world problems. One important issue in geometric computations is that the geometric models in these problems can be so large that computations on them have infeasible storage or computation time requirements. Decomposition is a technique commonly used to partition complex models into simpler components. Whereas decomposition into convex components results in pieces that are easy to process, such decompositions can be costly to construct and can result in representations with an unmanageable number of components. In this work, we have developed an approximate technique, called Approximate Convex Decomposition (ACD), which decomposes a given polygon or polyhedron into "approximately convex" pieces that may provide similar benefits as convex components, while the resulting decomposition is both significantly smaller (typically by orders of magnitude) and can be computed more efficently. Indeed, for many applications, an ACD can represent the important structural features of the model more accurately by providing a mechanism for ignoring less significant features, such as wrinkles and surface texture. Our study of a wide range of applications shows that in addition to providing computational efficiency, ACD also provides natural multi-resolution or hierarchical representations. In this dissertation, we provide some examples of ACD's many potential applications, such as particle simulation, mesh generation, motion planning, and skeleton extraction.


Approximate Convex Decomposition and its Applications, Jyh-Ming Lien, Doctoral dissertation, Texas A & M University, College Station, Texas, USA, Dec 2006. DOI: N/A
Keywords: Computational Geometry, Convex Decomposition
Links : [Published]

BibTex

@phdthesis{lien2006approximate,
title={Approximate convex decomposition and its applications},
author={Lien, Jyh-Ming},
volume={69},
number={01},
year={2006}
}


Abstract

Geometric computations are essential in many real-world problems. One important issue in geometric computations is that the geometric models in these problems can be so large that computations on them have infeasible storage or computation time requirements. Decomposition is a technique commonly used to partition complex models into simpler components. Whereas decomposition into convex components results in pieces that are easy to process, such decompositions can be costly to construct and can result in representations with an unmanageable number of components. In this work, we have developed an approximate technique, called Approximate Convex Decomposition (ACD), which decomposes a given polygon or polyhedron into "approximately convex" pieces that may provide similar benefits as convex components, while the resulting decomposition is both significantly smaller (typically by orders of magnitude) and can be computed more efficently. Indeed, for many applications, an ACD can represent the important structural features of the model more accurately by providing a mechanism for ignoring less significant features, such as wrinkles and surface texture. Our study of a wide range of applications shows that in addition to providing computational efficiency, ACD also provides natural multi-resolution or hierarchical representations. In this dissertation, we provide some examples of ACD's many potential applications, such as particle simulation, mesh generation, motion planning, and skeleton extraction.


Custom Memory Allocation for Free: Improving Data Locality with Container-Centric Memory Allocation, Alin Jula, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science, Vol: 4382, New Orleans, Louisiana, USA, Nov 2006. DOI: 10.1007/978-3-540-72521-3_22
Keywords: Memory Management, Runtime Systems, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-540-72521-3_22,
author="Jula, Alin
and Rauchwerger, Lawrence",
editor="Alm{\'a}si, George
and Ca{\c{s}}caval, C{\u{a}}lin
and Wu, Peng",
title="Custom Memory Allocation for Free",
booktitle="Languages and Compilers for Parallel Computing",
year="2007",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="299--313",
abstract="We present a novel and efficient container-centric memory allocator, named Defero, which allows a container to guide the allocation of its elements. The guidance is supported by the semantic-rich context of containers in which a new element is inserted. Defero allocates based on two attributes: size and location. Our policy of allocating a new object close to a related object often results in significantly increased memory reference locality. Defero has been integrated to work seamlessly with the C++ Standard Template Library (STL) containers. The communication between containers and the memory allocator is very simple and insures portability. STL container modification is the only needed code change to achieve custom memory allocation. We present experimental results that show the performance improvements that can be obtained by using Defero as a custom allocator for STL applications. We have applied our memory allocator to the molecular dynamics and compiler applications and obtained significant performance improvements over using the standard GNU STL allocator. With our approach custom memory allocation has been achieved without any modification of the actual applications, i.e., without additional programming efforts.",
isbn="978-3-540-72521-3"
}


Abstract

We present a novel and efficient container-centric memory allocator, named Defero, which allows a container to guide the allocation of its elements. The guidance is supported by the semantic-rich context of containers in which a new element is inserted. Defero allocates based on two attributes: size and location. Our policy of allocating a new object close to a related object often results in significantly increased memory reference locality. Defero has been integrated to work seamlessly with the C++ Standard Template Library (STL) containers. The communication between containers and the memory allocator is very simple and insures portability. STL container modification is the only needed code change to achieve custom memory allocation. We present experimental results that show the performance improvements that can be obtained by using Defero as a custom allocator for STL applications. We have applied our memory allocator to the molecular dynamics and compiler applications and obtained significant performance improvements over using the standard GNU STL allocator. With our approach custom memory allocation has been achieved without any modification of the actual applications, i.e., without additional programming efforts.


Design and use of HTAlib - A library for hierarchically tiled arrays, Ganesh Bikshandy, Jia Guo, Christoph von Praun, Gabriel Tanase, Basilio Fraguela, Maria Jesus Garzaran, David Padua, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science, Vol: 4382, pp. 17-32, New Orleans, LA, USA, Nov 2006. DOI: 10.1007/978-3-540-72521-3_3
Keywords: Memory Management, Parallel Algorithms, Parallel Containers
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-540-72521-3_3,
author="Bikshandi, Ganesh
and Guo, Jia
and von Praun, Christoph
and Tanase, Gabriel
and Fraguela, Basilio B.
and Garzar{\'a}n, Mar{\'i}a J.
and Padua, David
and Rauchwerger, Lawrence",
editor="Alm{\'a}si, George
and Ca{\c{s}}caval, C{\u{a}}lin
and Wu, Peng",
title="Design and Use of htalib -- A Library for Hierarchically Tiled Arrays",
booktitle="Languages and Compilers for Parallel Computing",
year="2007",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="17--32",
abstract="Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with block-recursive nature. The model underlying HTAs provides programmers with a global view of distributed data as well as a single-threaded view of the execution. In this paper we present htalib, a C++ implementation of HTAs. This library provides several novel constructs: (i) A map-reduce operator framework that facilitates the implementation of distributed operations with HTAs. (ii) Overlapped tiling in support of tiling in stencil codes. (iii) Data layering, facilitating the use of HTAs in adaptive mesh refinement applications. We describe the interface and design of htalib and our experience with the new programming constructs.",
isbn="978-3-540-72521-3"
}


Abstract

Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with block-recursive nature. The model underlying HTAs provides programmers with a global view of distributed data as well as a single-threaded view of the execution. In this paper we present htalib, a C++ implementation of HTAs. This library provides several novel constructs: (i) A map-reduce operator framework that facilitates the implementation of distributed operations with HTAs. (ii) Overlapped tiling in support of tiling in stencil codes. (iii) Data layering, facilitating the use of HTAs in adaptive mesh refinement applications. We describe the interface and design of htalib and our experience with the new programming constructs.


ARMI: A High Level Communication Library for STAPL, Nathan Thomas, Steven Saunders, Tim Smith, Gabriel Tanase, Lawrence Rauchwerger, Parallel Processing Letters, Vol: 16, Issue: 2, pp. 261-280, Jun 2006. DOI: 10.1142/S0129626406002617
Keywords: Communication Library, Remote Method Invocation, Runtime Systems
Links : [Published]

BibTex

@article{10.1142/S0129626406002617,
author = "Nathan Thomas and Steven Saunders and Tim Smith and Gabriel Tanase and Lawrence Rauchwerger",
title = "ARMI: A High Level Communication Library for STAPL",
journal = "Parallel Processing Letters",
year = 2006,
volume = "16",
number = "2",
pages = "261-280",
}


Abstract

ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer. We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP-V2200, Origin 3800, IBM Regatta and IBM RS/6000 SP cluster.


Simultaneous Shape Decomposition and Skeletonization, Jyh-Ming Lien, John Keyser , Nancy M. Amato, In. Proc. of the 2006 ACM symposium on Solid and physical modeling, Cardiff, Wales, United Kingdom, Jun 2006. DOI: 10.1145/1128888.1128919
Keywords: Computational Geometry, Convex Decomposition
Links : [Published]

BibTex

@inproceedings{10.1145/1128888.1128919,
author = {Lien, Jyh-Ming and Keyser, John and Amato, Nancy M.},
title = {Simultaneous Shape Decomposition and Skeletonization},
year = {2006},
isbn = {1595933581},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1128888.1128919},
doi = {10.1145/1128888.1128919},
booktitle = {Proceedings of the 2006 ACM Symposium on Solid and Physical Modeling},
pages = {219–228},
numpages = {10},
keywords = {skeletonization, convex decomposition, multi-resolution skeleton},
location = {Cardiff, Wales, United Kingdom},
series = {SPM '06}
}


Abstract

Shape decomposition and skeletonization share many common properties and applications. However, they are generally treated as independent computations. In this paper, we propose an iterative approach that simultaneously generates a hierarchical shape decomposition and a corresponding set of multi-resolution skeletons. In our method, a skeleton of a model is extracted from the components of its decomposition --- that is, both processes and the qualities of their results are interdependent. In particular, if the quality of the extracted skeleton does not meet some user specified criteria, then the model is decomposed into finer components and a new skeleton is extracted from these components. The process of simultaneous shape decomposition and skeletonization iterates until the quality of the skeleton becomes satisfactory. We provide evidence that the proposed framework is efficient and robust under perturbation and. deformation. We also demonstrate that our results can readily be used in problems including skeletal deformations and virtual reality navigation.


Simultaneous Shape Decomposition and Skeletonization, Jyh-Ming Lien, John Keyser , Nancy M. Amato, In. Proc. of the 2006 ACM symposium on Solid and physical modeling, Cardiff, Wales, United Kingdom, Jun 2006. DOI: 10.1145/1128888.1128919
Keywords: Computational Geometry, Convex Decomposition
Links : [Published]

BibTex

@inproceedings{10.1145/1128888.1128919,
author = {Lien, Jyh-Ming and Keyser, John and Amato, Nancy M.},
title = {Simultaneous Shape Decomposition and Skeletonization},
year = {2006},
isbn = {1595933581},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1128888.1128919},
doi = {10.1145/1128888.1128919},
booktitle = {Proceedings of the 2006 ACM Symposium on Solid and Physical Modeling},
pages = {219–228},
numpages = {10},
keywords = {skeletonization, convex decomposition, multi-resolution skeleton},
location = {Cardiff, Wales, United Kingdom},
series = {SPM '06}
}


Abstract

Shape decomposition and skeletonization share many common properties and applications. However, they are generally treated as independent computations. In this paper, we propose an iterative approach that simultaneously generates a hierarchical shape decomposition and a corresponding set of multi-resolution skeletons. In our method, a skeleton of a model is extracted from the components of its decomposition --- that is, both processes and the qualities of their results are interdependent. In particular, if the quality of the extracted skeleton does not meet some user specified criteria, then the model is decomposed into finer components and a new skeleton is extracted from these components. The process of simultaneous shape decomposition and skeletonization iterates until the quality of the skeleton becomes satisfactory. We provide evidence that the proposed framework is efficient and robust under perturbation and. deformation. We also demonstrate that our results can readily be used in problems including skeletal deformations and virtual reality navigation.


VIZMO++: a visualization, authoring, and educational tool for motion planning, Aimée Vargas E., Jyh-Ming Lien, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 727-732, Orlando, Florida, USA, May 2006. DOI: 10.1109/ROBOT.2006.1641796
Keywords: Sampling-Based Motion Planning, Visualization
Links : [Published]

BibTex

@INPROCEEDINGS{1641796,
author={A. V. {Estrada} and {Jyh-Ming Lien} and N. M. {Amato}},
booktitle={Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.},
title={VIZMO++: a visualization, authoring, and educational tool for motion planning},
year={2006},
volume={},
number={},
pages={727-732},
doi={10.1109/ROBOT.2006.1641796}}


Abstract

Comprehension of concepts and algorithms involved in the robotics field can be improved through the use of an interactive visualization tool. In this paper we present an interactive tool for visualizing and editing motion planning environments, problem instances, and their solutions. Teachers can take advantage of visualization tools to help their students to better understand motion planning and its complexity as well as the different strategies that have been developed to solve the motion planning problem. While the tool we present allows the animation, manipulation, and evaluation of solution paths found by any motion planner, it is specialized for sampling-based randomized planners such as probabilistic roadmap (PRM) and rapidly-exploring random tree (RRT) methods.


Region Array SSA, Silvius Rus, Guobin He, Lawrence Rauchwerger, Technical Report, TR06-007, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, May 2006.
Keywords: Data Flow, Optimizing Compilers
Links : [Manuscript]

BibTex

@techreport{srus-tr06-007-2006,
title = "Region Array SSA",
author = "Rus, Silvius and He, Guobin and Rauchwerger, Lawrence",
institution = "Parasol Laboratory, Department of Computer Science, Texas A&M University",
address = "College Station, TX",
number = "TR06-007",
year = 2006,
month = may
}


Abstract

Static Single Assignment (SSA) has become the intermediate program representation of choice in most modern compilers because it enables efficient data flow analysis of scalars and thus leads to better scalar optimizations. Unfortunately not much progress has been achieved in applying the same techniques to array data flow analysis, a very important and potentially powerful technology. In this paper we propose to improve the applicability of previous efforts in array SSA through the use of a symbolic memory access descriptor that can aggregate the accesses to the elements of an array over large, interprocedural program contexts. We then show the power of our new representation by using it to implement a basic data flow algorithm, reaching definitions. Finally we apply this analysis to array constant propagation and array privatization and show performance improvement (speedups) for benchmark codes.


Planning Motion in Completely Deformable Environments, Samuel Rodriguez, Jyh-Ming Lien, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Orlando, Florida, USA, May 2006. DOI: 10.1109/ROBOT.2006.1642072
Keywords: Deformable Objects, Kinodynamic Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1642072, author={S. {Rodriguez} and {Jyh-Ming Lien} and N. M. {Amato}}, booktitle={Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.}, title={Planning motion in completely deformable environments}, year={2006}, volume={}, number={}, pages={2466-2471}, doi={10.1109/ROBOT.2006.1642072}}


Abstract

Though motion planning has been studied extensively for rigid and articulated robots, motion planning for deformable objects is an area that has received far less attention. In this paper we present a framework for planning paths in completely deformable, elastic environments. We apply a deformable model to the robot and obstacles in the environment and present a kinodynamic planning algorithm suited for this type of deformable motion planning. The planning algorithm is based on the rapidly-exploring random tree (RRT) path planning algorithm. To the best of our knowledge, this is the first work that plans paths in totally deformable environments


An Obstacle-Based Rapidly-Exploring Random Tree, Samuel Rodriguez, Xinyu Tang, Jyh-Ming Lien, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Orlando, Florida, USA, May 2006. DOI: 10.1109/ROBOT.2006.1641823
Keywords: Rapidly-exploring Random Tree (RRT), Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1641823, author={ {Rodriguez} and {Xinyu Tang} and {Jyh-Ming Lien} and N. M. {Amato}}, booktitle={Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.}, title={An obstacle-based rapidly-exploring random tree}, year={2006}, volume={}, number={}, pages={895-900}, doi={10.1109/ROBOT.2006.1641823}}


Abstract

Tree-based path planners have been shown to be well suited to solve various high dimensional motion planning problems. Here we present a variant of the Rapidly-Exploring Random Tree (RRT) path planning algorithm that is able to explore narrow passages or difficult areas more effectively. We show that both workspace obstacle information and C-space information can be used when deciding which direction to grow. The method includes many ways to grow the tree, some taking into account the obstacles in the environment. This planner works best in difficult areas when planning for free flying rigid or articulated robots. Indeed, whereas the standard RRT can face difficulties planning in a narrow passage, the tree based planner presented here works best in these areas


SmartApps: Middleware for Adaptive Applications on Reconfigurable Platforms, Lawrence Rauchwerger, Nancy Amato, ACM SIGOPS Operating Systems Review, Vol: 40, Issue: 2, pp. 73-82, Apr 2006. DOI: 10.1145/1131322.1131338
Keywords: High-Performance Computing, Memory Management, Runtime Systems
Links : [Published]

BibTex

@article{10.1145/1131322.1131338,
author = {Rauchwerger, Lawrence and Amato, Nancy M.},
title = {SmartApps: Middle-Ware for Adaptive Applications on Reconfigurable Platforms},
year = {2006},
issue_date = {April 2006},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {40},
number = {2},
issn = {0163-5980},
url = {https://doi.org/10.1145/1131322.1131338},
doi = {10.1145/1131322.1131338},
abstract = {One general avenue to obtain optimized performance on large and complex systems is to approach optimization from a global perspective of the complete system in a customized manner for each application, i.e., application-centric optimization. Lately, there have been encouraging developments in reconfigurable operating systems and hardware that will enable customized optimization. For example, machines built with PIM's and FPGA's can be quickly reconfigured to better fit a certain application and operating systems, such as IBM's K42, can have their services customized to fit the needs and characteristics of an application. While progress in operating system and hardware and hardware has made re-configuration possible, we still need strategies and techniques to exploit them for improved application performance.In this paper, we describe the approach we are using in our smart application (SMARTAPPS) project. In the SMARTAPP executable, the compiler embeds most run-time system services and a feedback loop to monitor performance and trigger run-time adaptations. At run-time, after incorporating the code's input and determining the system's state, the SMARTAPP performs an instance specific optimization. During execution, the application continually monitors its performance and the available resources to determine if restructuring should occur. The framework includes mechanisms for performing the actual restructuring at various levels including: algorithmic adaptation, tuning reconfigurable OS services (scheduling policy, page size, etc.), and system configuration (e.g., number of processors). This paper concentrates on the techniques for providing customized system services for communication, thread scheduling, memory management, and performance monitoring and modeling.},
journal = {SIGOPS Oper. Syst. Rev.},
month = apr,
pages = {73–82},
numpages = {10}
}


Abstract

One general avenue to obtain optimized performance on large and complex systems is to approach optimization from a global perspective of the complete system in a customized manner for each application, i.e., application-centric optimization. Lately, there have been encouraging developments in reconfigurable operating systems and hardware that will enable customized optimization. For example, machines built with PIM's and FPGA's can be quickly reconfigured to better fit a certain application and operating systems, such as IBM's K42, can have their services customized to fit the needs and characteristics of an application. While progress in operating system and hardware and hardware has made re-configuration possible, we still need strategies and techniques to exploit them for improved application performance.In this paper, we describe the approach we are using in our smart application (SMARTAPPS) project. In the SMARTAPP executable, the compiler embeds most run-time system services and a feedback loop to monitor performance and trigger run-time adaptations. At run-time, after incorporating the code's input and determining the system's state, the SMARTAPP performs an instance specific optimization. During execution, the application continually monitors its performance and the available resources to determine if restructuring should occur. The framework includes mechanisms for performing the actual restructuring at various levels including: algorithmic adaptation, tuning reconfigurable OS services (scheduling policy, page size, etc.), and system configuration (e.g., number of processors). This paper concentrates on the techniques for providing customized system services for communication, thread scheduling, memory management, and performance monitoring and modeling.


A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations, Mark M. Mathis, Darren J. Kerbyson, Journal of Supercomputing, Vol: 34, Issue: 2, pp. 181-199, Nov 2005. DOI: 10.1007/s11227-005-2339-8
Keywords: High-Performance Computing, Parallel Algorithms
Links : [Published]

BibTex

@Article{Mathis2005,
author={Mathis, Mark M.
and Kerbyson, Darren J.},
title={A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations},
journal={The Journal of Supercomputing},
year={2005},
month={Nov},
day={01},
volume={34},
number={2},
pages={181-199},
abstract={The performance of unstructured mesh applications presents a number of complexities and subtleties that do not arise for dense structured meshes. From a programming point of view, the handling of unstructured meshes has an increased complexity in order to manage the necessary data structures and interactions between mesh-cells. From a performance point of view, there are added difficulties in understanding both the processing time on a single processor and the scaling characteristics when using large-scale parallel systems. In this work we present a general performance model for the calculation of deterministic SNtransport on unstructured meshes that is also applicable to structured meshes. The model captures the key processing characteristics of the calculation and is parametric using both system performance data (latency, bandwidth, processing rate etc.) and application data (mesh size etc.) as input. A single formulation of the model is used to predict the performance of two quite different implementations of the same calculation. It is validated on two clusters (an HP AlphaServer and an Itanium-2 system) showing high prediction accuracy.},
issn={1573-0484},
doi={10.1007/s11227-005-2339-8},
url={https://doi.org/10.1007/s11227-005-2339-8}
}


Abstract

The performance of unstructured mesh applications presents a number of complexities and subtleties that do not arise for dense structured meshes. From a programming point of view, the handling of unstructured meshes has an increased complexity in order to manage the necessary data structures and interactions between mesh-cells. From a performance point of view, there are added difficulties in understanding both the processing time on a single processor and the scaling characteristics when using large-scale parallel systems. In this work we present a general performance model for the calculation of deterministic S N transport on unstructured meshes that is also applicable to structured meshes. The model captures the key processing characteristics of the calculation and is parametric using both system performance data (latency, bandwidth, processing rate etc.) and application data (mesh size etc.) as input. A single formulation of the model is used to predict the performance of two quite different implementations of the same calculation. It is validated on two clusters (an HP AlphaServer and an Itanium-2 system) showing high prediction accuracy.


Protein Folding by Motion Planning, Shawna Thomas, Guang Song, Nancy M. Amato, Physical Biology, Vol: 2, Issue: 4, pp. S148-S155, Nov 2005. DOI: 10.1088/1478-3975/2/4/S09
Keywords: Protein Folding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{Thomas_2005,
doi = {10.1088/1478-3975/2/4/s09},
url = {https://doi.org/10.1088%2F1478-3975%2F2%2F4%2Fs09},
year = 2005,
month = {nov},
publisher = {{IOP} Publishing},
volume = {2},
number = {4},
pages = {S148--S155},
author = {Shawna Thomas and Guang Song and Nancy M Amato},
title = {Protein folding by motion planning},
journal = {Physical Biology},
abstract = {We investigate a novel approach for studying protein folding that has evolved from robotics motion planning techniques called probabilistic roadmap methods (PRMs). Our focus is to study issues related to the folding process, such as the formation of secondary and tertiary structures, assuming we know the native fold. A feature of our PRM-based framework is that the large sets of folding pathways in the roadmaps it produces, in just a few hours on a desktop PC, provide global information about the protein\'s energy landscape. This is an advantage over other simulation methods such as molecular dynamics or Monte Carlo methods which require more computation and produce only a single trajectory in each run. In our initial studies, we obtained encouraging results for several small proteins. In this paper, we investigate more sophisticated techniques for analyzing the folding pathways in our roadmaps. In addition to more formally revalidating our previous results, we present a case study showing that our technique captures known folding differences between the structurally similar proteins G and L.}
}


Abstract

We investigate a novel approach for studying protein folding that has evolved from robotics motion planning techniques called probabilistic roadmap methods (PRMs). Our focus is to study issues related to the folding process, such as the formation of secondary and tertiary structures, assuming we know the native fold. A feature of our PRM-based framework is that the large sets of folding pathways in the roadmaps it produces, in just a few hours on a desktop PC, provide global information about the protein\'s energy landscape. This is an advantage over other simulation methods such as molecular dynamics or Monte Carlo methods which require more computation and produce only a single trajectory in each run. In our initial studies, we obtained encouraging results for several small proteins. In this paper, we investigate more sophisticated techniques for analyzing the folding pathways in our roadmaps. In addition to more formally revalidating our previous results, we present a case study showing that our technique captures known folding differences between the structurally similar proteins G and L.


Hybrid Dependence Analysis for Automatic Parallelization, Silvius Rus, Lawrence Rauchwerger, Technical Report, TR05-013, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, Nov 2005.
Keywords: Data Dependence, Optimizing Compilers, Parallelising Compilers
Links : [Manuscript]

BibTex

@techreport{srus-tr05-013-2005,
title = "Hybrid Dependence Analysis for Automatic Parallelization",
author = "Rus, Silvius and Rauchwerger, Lawrence",
institution = "Parasol Laboratory, Department of Computer Science, Texas A&M University",
address = "College Station, TX",
number = "TR05-013",
year = 2005,
month = nov
}


Abstract

Automatic program parallelization has been an elusive goal for many years. It has recently become more important due to the widespread introduction of multi-cores in PCs. Automatic parallelization could not be achieved because classic compiler analysis was neither powerful enough and program behavior was found to be in many cases input dependent. Run-time thread level parallelization, introduced in 1995, was a welcome but somewhat different avenue for advancing parallelization coverage. In this paper we introduce a novel analysis, Hybrid Analysis (HA), which unifies static and dynamic memory reference techniques into a seamless compiler framework which extracts almost maximum available parallelism from scientific codes and generates minimum run-time overhead. In this paper we will present how we can extract maximum information from the quantities that could not be sufficiently analyzed through static compiler methods and generate sufficient conditions which, when evaluated dynamically can validate optimizations. A large number of experiments confirm the viability of our techniques, which have been implemented in the Polaris compiler.


An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications, Ravi Iyer, Jack Perdue, Lawrence Rauchwerger, Nancy M. Amato and Laxmi Bhuyan, International Journal of Parallel Programming, Vol: 33, pp. 307-350, Aug 2005. DOI: 10.1007/s10766-004-1187-0
Keywords: High-Performance Computing, Parallel Processing, Scientific Computing
Links : [Published]

BibTex

@article{iyer-ijpp-2005
, author = "Ravi Iyer and Jack Perdue and Lawrence Rauchwerger and Nancy M. Amato and Laxmi Bhuyan"
, title = "An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications"
, journal = "Internaational Journal of Parallel Programming"
, volume = "33"
, pages = "307--350"
, year = "2005"
, doi = "10.1007/s10766-004-1187-0"
}


Abstract

As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of Ll/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of Processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the application’s execution time performance.


Iterative Relaxation of Constraints: A Framework for Improving Automated Motion Planning, O. Burchan Bayazit, Dawen Xie, Nancy M. Amato, In Proc. IEEE Int. Conf. Intel. Rob. Syst. (IROS), pp. 586-593, Edmonton, Alberta, Canada, Aug 2005. DOI: 10.1109/IROS.2005.1545045
Keywords: Computational Biology, Sampling-Based Motion Planning, Virtual Reality
Links : [Published]

BibTex

@INPROCEEDINGS{1545045,
author={O. B. {Bayazit} and {Dawen Xie} and N. M. {Amato}},
booktitle={2005 IEEE/RSJ International Conference on Intelligent Robots and Systems}, title={Iterative relaxation of constraints: a framework for improving automated motion planning},
year={2005},
volume={},
number={},
pages={3433-3440},
doi={10.1109/IROS.2005.1545045}}


Abstract

This paper presents a technique for improving the efficiency of automated motion planners. Motion planning has application in many areas such as robotics, virtual reality systems, computer-aided design, and even computational biology. Although there have been steady advances in motion planning algorithms, especially in randomized approaches such as probabilistic roadmap methods (PRMs) or rapidly-exploring random trees (RRTs), there are still some classes of problems that cannot be solved efficiently using these state-of-the-art motion planners. In this paper, we suggest an iterative strategy addressing this problem where we first simplify the problem by relaxing some feasibility constraints, solve the easier version of the problem, and then use that solution to help us find a solution for the harder problem. We show how this strategy can be applied to rigid bodies and to linkages with high degrees of freedom, including both open and closed chain systems. Experimental results are presented for linkages composed of 9-98 links. Although we use PRMs as the automated planner, the framework is general and can be applied with other motion planning techniques as well.


A Framework for Adaptive Algorithm Selection in STAPL, Nathan Thomas, Gabriel Tanase, Olga Tkachyshyn, Jack Perdue, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 277-288, Chicago, IL, USA, Jun 2005. DOI: 10.1145/1065944.1065981
Keywords: Parallel Algorithms, Parallel Programming, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/1065944.1065981,
author = {Thomas, Nathan and Tanase, Gabriel and Tkachyshyn, Olga and Perdue, Jack and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Framework for Adaptive Algorithm Selection in STAPL},
year = {2005},
isbn = {1595930809},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1065944.1065981},
doi = {10.1145/1065944.1065981},
abstract = {Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.},
booktitle = {Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {277–288},
numpages = {12},
keywords = {machine learning, parallel algorithms, matrix multiplication, sorting, adaptive algorithms},
location = {Chicago, IL, USA},
series = {PPoPP \'05}
}


Abstract

Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.


C-space Subdivision and Integration in Feature-Sensitive Motion Planning, Marco A. Morales A., Lydia Tapia, Roger Pearce, Samuel Rodriguez, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 3114-3119, Barcelona, Spain, May 2005. DOI: 10.1109/ROBOT.2005.1570589
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1570589,
author={M. A. {Morales A.} and L. {Tapia} and R. {Pearce} and S. {Rodriguez} and N. M. {Amato}},
booktitle={Proceedings of the 2005 IEEE International Conference on Robotics and Automation}, title={C-space Subdivision and Integration in Feature-Sensitive Motion Planning},
year={2005},
volume={},
number={},
pages={3114-3119},
doi={10.1109/ROBOT.2005.1570589}}


Abstract

There are many randomized motion planning techniques, but it is often difficult to determine what planning method to apply to best solve a problem. Planners have their own strengths and weaknesses, and each one is best suited to a specific type of problem. In previous work, we proposed a meta-planner that, through analysis of the problem features, subdivides the instance into regions and determines which planner to apply in each region. The results obtained with our prototype system were very promising even though it utilized simplistic strategies for all components. Even so, we did determine that strategies for problem subdivision and for combination of partial regional solutions have a crucial impact on performance. In this paper, we propose new methods for these steps to improve the performance of the meta-planner. For problem subdivision, we propose two new methods: a method based on ‘ gaps’ and a method based on information theory. For combining partial solutions, we propose two new methods that concentrate on neighboring areas of the regional solutions. We present results that show the performance gain achieved by utilizing these new strategies.


C-Space Subdivision and Integration in Feature-Sensitive Motion Planning, Marco A. Morales A., Lydia Tapia, Roger Pearce, Samuel Rodriguez, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), pp. 3114-3119, Barcelona, Spain, Apr 2005. DOI: 10.1109/ROBOT.2005.1570589
Keywords: Machine Learning, Motion Planning, Sampling-Based Motion Planning
Links : [Published] [Manuscript]

BibTex

@INPROCEEDINGS{1570589,
author={M. A. {Morales A.} and L. {Tapia} and R. {Pearce} and S. {Rodriguez} and N. M. {Amato}},
booktitle={Proceedings of the 2005 IEEE International Conference on Robotics and Automation},
title={C-space Subdivision and Integration in Feature-Sensitive Motion Planning},
year={2005},
volume={},
number={},
pages={3114-3119},
doi={10.1109/ROBOT.2005.1570589}}


Abstract

There are many randomized motion planning techniques, but it is often difficult to determine what planning method to apply to best solve a problem. Planners have their own strengths and weaknesses, and each one is best suited to a specific type of problem. In previous work, we proposed a meta-planner that, through analysis of the problem features, subdivides the instance into regions and determines which planner to apply in each region. The results obtained with our prototype system were very promising even though it utilized simplistic strategies for all components. Even so, we did determine that strategies for problem subdivision and for combination of partial regional solutions have a crucial impact on performance. In this paper, we propose new methods for these steps to improve the performance of the meta-planner. For problem subdivision, we propose two new methods: a method based on ‘ gaps’ and a method based on information theory. For combining partial solutions, we propose two new methods that concentrate on neighboring areas of the regional solutions. We present results that show the performance gain achieved by utilizing these new strategies.


Swarming Behavior Using Probabilistic Roadmap Techniques, O. Burçhan Bayazıt, Jyh-Ming Lien, Nancy M. Amato, Lecture Notes in Computer Science, Vol: 3342, pp. 112-125, Jan 2005. DOI: 10.1007/978-3-540-30552-1_10
Keywords: Group Behavior, Multi-Agent, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{10.1007/978-3-540-30552-1_10,
author ={O. Burçhan {Bayazıt}, Jyh-Ming {Lien}, Nancy M. {Amato}},
title ={Swarming Behavior Using Probabilistic Roadmap Techniques},
journal ={Lecture Notes in Computer Science},
volume ={3342},
year ={2003},
number ={},
pages ={112--125},
doi={10.1007/978-3-540-30552-1_10}
}


Abstract

While techniques exist for simulating swarming behaviors, these methods usually provide only simplistic navigation and planning capabilities. In this review, we explore the benefits of integrating roadmap-based path planning methods with flocking techniques to achieve different behaviors. We show how group behaviors such as exploring can be facilitated by using dynamic roadmaps (e.g., modifying edge weights) as an implicit means of communication between flock members. Extending ideas from cognitive modeling, we embed behavior rules in individual flock members and in the roadmap. These behavior rules enable the flock members to modify their actions based on their current location and state. We propose new techniques for several distinct group behaviors: homing, exploring (covering and goal searching), passing through narrow areas and shepherding. We present results that show that our methods provide significant improvement over methods that utilize purely local knowledge and moreover, that we achieve performance approaching that which could be obtained by an ideal method that has complete global knowledge. Animations of these behaviors can be viewed on our webpages.


Run-time Optimizations of Adaptive Irregular Applications, Hao Yu, Doctoral dissertation, Texas A&M University, College Station, TX, USA, Nov 2004.
Keywords: Data Dependence, Optimizing Compilers
Links : [Published]

BibTex

@phdthesis{hyu-phdthesis-2004,
author = "Yu, Hao",
title = "Run-time Optimizations of Adaptive Irregular Applications",
school = "Texas A&M University",
address = "College Station, TX, USA",
year = 2004,
month = nov
}


Abstract

Compared to traditional compile-time optimization, run-time optimization could offer significant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identified a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Specifically, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm specified by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems.


The Value Evolution Graph and its Use in Memory Reference Analysis, Silvius Rus, Dongmin Zhang, Lawrence Rauchwerger, 13th International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 243-254, Antibes, Juan-les-Pins, France, Sep 2004. DOI: 10.1109/PACT.2004.1342558
Keywords: Data Flow, Optimizing Compilers
Links : [Published]

BibTex

@INPROCEEDINGS{1342558, author={S. {Rus} and D. {Zhang} and L. {Rauchwerger}}, booktitle={Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004.}, title={The value evolution graph and its use in memory reference analysis}, year={2004}, volume={}, number={}, pages={243-254}, doi={10.1109/PACT.2004.1342558}}


Abstract

We introduce a framework for the analysis of memory reference sets addressed by induction variables without closed forms. This framework relies on a new data structure, the value evolution graph (VEG), which models the global flow of values taken by induction variable with and without closed forms. We describe the application of our framework to array data-flow analysis, privatization, and dependence analysis. This results in the automatic parallelization of loops that contain arrays addressed by induction variables without closed forms. We implemented this framework in the Polaris research compiler. We present experimental results on a set of codes from the PERFECT, SPEC, and NCSA benchmark suites.


Automatic Parallelization Using the Value Evolution Graph, Silvius Rus, Dongmin Zhang, Lawrence Rauchwerger, Languages and Compilers for High Performance Computing (LCPC), pp. 379-393, West Lafayette, Indiana, USA, Sep 2004. DOI: 10.1007/11532378_27
Keywords: Data Flow, Optimizing Compilers
Links : [Published]

BibTex

@InProceedings{10.1007/11532378_27,
author=\"Rus, Silvius
and Zhang, Dongmin
and Rauchwerger, Lawrence\",
editor=\"Eigenmann, Rudolf
and Li, Zhiyuan
and Midkiff, Samuel P.\",
title=\"Automatic Parallelization Using the Value Evolution Graph\",
booktitle=\"Languages and Compilers for High Performance Computing\",
year=\"2005\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"379--393\",
abstract=\"We introduce a framework for the analysis of memory reference sets addressed by induction variables without closed forms. This framework relies on a new data structure, the Value Evolution Graph (VEG), which models the global flow of scalar and array values within a program. We describe the application of our framework to array data-flow analysis, privatization, and dependence analysis. This results in the automatic parallelization of loops that contain arrays indexed by induction variables without closed forms. We implemented this framework in the Polaris research compiler. We present experimental results on a set of codes from the PERFECT, SPEC, and NCSA benchmark suites.\",
isbn=\"978-3-540-31813-2\"
}


Abstract

We introduce a framework for the analysis of memory reference sets addressed by induction variables without closed forms. This framework relies on a new data structure, the Value Evolution Graph (VEG), which models the global flow of scalar and array values within a program. We describe the application of our framework to array data-flow analysis, privatization, and dependence analysis. This results in the automatic parallelization of loops that contain arrays indexed by induction variables without closed forms. We implemented this framework in the Polaris research compiler. We present experimental results on a set of codes from the PERFECT, SPEC, and NCSA benchmark suites.


An Adaptive Algorithm Selection Framework, Hao Yu, Dongmin Zhang, Lawrence Rauchwerger, 13th International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 278-289, Antibes, Juan-les-Pins, France, Sep 2004. DOI: 10.1109/PACT.2004.1342561
Keywords: Adaptive Algorithm, Machine Learning
Links : [Published]

BibTex

@INPROCEEDINGS{1342561, author={H. {Yu} and D. {Zhang} and L. {Rauchwerger}}, booktitle={Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004.}, title={An adaptive algorithm selection framework}, year={2004}, volume={}, number={}, pages={278-289}, doi={10.1109/PACT.2004.1342561}}


Abstract

Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. We present an adaptive algorithm selection framework which can collect and interpret the inputs of a particular instance of a parallel algorithm and select the best performing one from an existing library. We present the dynamic selection of parallel reduction algorithms. First we introduce a set of high-level parameters that can characterize different parallel reduction algorithms. Then we describe an offline, systematic process to generate predictive models, which can be used for run-time algorithm selection. Our experiments show that our framework: (a) selects the most appropriate algorithms in 85% of the cases studied, (b) overall delivers 98% of the optimal performance, (c) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and (d) adapts to the underlying machine architecture (tested on IBM Regatta and HP V-Class systems).


A Machine Learning Approach for Feature-Sensitive Motion Planning, Marco Morales, Lydia Tapia, Roger Pearce, Samuel Rodriguez, Nancy M. Amato, In Proc. Int. Wkshp. on Alg. Found. of Rob. (WAFR), pp. 361-376, Utrecht/Zeist, The Netherlands, Jul 2004. DOI: 10.1007/10991541_25
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{10.1007/10991541_25,
author={Marco {Morales}, Lydia {Tapia}, Roger {Pearce}, Samuel {Rodriguez}, Nancy M. {Amato}},
booktitle={In Proc. Int. Wkshp. on Alg. Found. of Rob. (WAFR)},
title={A Machine Learning Approach for Feature-Sensitive Motion Planning},
year={2004},
volume={}, number={}, pages={361--376},
doi={10.1007/10991541_25}
}


Abstract

Although there are many motion planning techniques, there is no method that outperforms all others for all problem instances. Rather, each technique has different strengths and weaknesses which makes it best-suited for certain types of problems. Moreover, since an environment can contain vastly different regions, there may not be a single planner that will perform well in all its regions. Ideally, one would use a suite of planners in concert and would solve the problem by applying the best-suited planner in each region. In this paper, we propose an automated framework for feature-sensitive motion planning. We use a machine learning approach to characterize and partition C-space into regions that are well suited to one of the methods in our library of roadmap-based motion planners. After the best-suited method is applied in each region, the resulting region roadmaps are combined to form a roadmap of the entire planning space. Over a range of problems, we demonstrate that our simple prototype system reliably outperforms any of the planners on their own.


Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures , Nancy M. Amato, Ken Dill, Guang Song, Journal of Computational Biology, Vol: 10, Issue: 3, pp. 239-255, Jul 2004. DOI: 10.1089/10665270360688002
Keywords: Computational Biology, Protein Folding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{10.1089/10665270360688002,
author ={Nancy M. {Amato}, Ken {Dill}, Guang {Song}},
title ={Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures},
journal ={Journal of Computational Biology},
volume ={10},year ={2004},number ={3-4},pages ={239-255},
doi={10.1089/10665270360688002}
}


Abstract

We investigate a novel approach for studying the kinetics of protein folding. Our framework has evolved from robotics motion planning techniques called probabilistic roadmap methods (PRMs) that have been applied in many diverse fields with great success. In our previous work, we presented our PRM-based technique and obtained encouraging results studying protein folding pathways for several small proteins. In this paper, we describe how our motion planning framework can be used to study protein folding kinetics. In particular, we present a refined version of our PRM-based framework and describe how it can be used to produce potential energy landscapes, free energy landscapes, and many folding pathways all from a single roadmap which is computed in a few hours on a desktop PC. Results are presented for 14 proteins. Our ability to produce large sets of unrelated folding pathways may potentially provide crucial insight into some aspects of folding kinetics, such as proteins that exhibit both two-state and three-state kinetics that are not captured by other theoretical techniques.


Performance Modeling of Unstructured Mesh Particle Transport Computations, Mark M. Mathis, Darren J. Kerbyson, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., pp. 245-, Santa Fe, NM, USA, Apr 2004. DOI: 10.1109/IPDPS.2004.1303300
Keywords: High-Performance Computing, Parallel Algorithms
Links : [Published]

BibTex

@INPROCEEDINGS{1303300,
author={M. M. {Mathis} and D. J. {Kerbyson}},
booktitle={18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.},
title={Performance modeling of unstructured mesh particle transport computations},
year={2004},
volume={},
number={},
pages={245-},
doi={10.1109/IPDPS.2004.1303300}}


Abstract

Summary form only given. The performance of unstructured mesh applications presents a number of complexities and subtleties that do not arise for dense structured meshes. From a programming point of view, handling an unstructured mesh has an increased complexity to manage the necessary data structures and interactions between mesh-cells. From a performance point of view, there are added difficulties in understanding both the processing time on a single processor and the scaling characteristics. Here we present a performance model for the calculation of deterministic S/sub N/ transport on unstructured meshes. It builds upon earlier work that successfully modeled the same calculation on structured meshes. The model captures the key processing characteristics and is parametric using both the system performance data (latency, bandwidth, processing rate etc.) and application data (mesh size etc.) as input. The model is validated on two clusters (an HP AlphaServer and an Itanium-2 system) showing high accuracy. Importantly it is also shown that a single formulation of the model can be used to predict the performance of two quite different implementations of the same calculation.


Parallel Protein Folding with STAPL, Shawna Thomas, Nancy M. Amato, 18th International Parallel and Distributed Processing Symposium (IPDPS), pp. 189-, Santa Fe, NM, USA, Apr 2004. DOI: 10.1109/IPDPS.2004.1303204
Keywords: Protein Folding, Sampling-Based Motion Planning, STAPL
Links : [Published]

BibTex

@INPROCEEDINGS{1303204,
author={S. {Thomas} and N. M. {Amato}},
booktitle={18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.},
title={Parallel protein folding with STAPL},
year={2004},
volume={},
number={},
pages={189-},
doi={10.1109/IPDPS.2004.1303204}}


Abstract

Summary form only given. The protein folding problem is to study how a protein dynamically folds to its so-called native state - an energetically stable, three-dimensional configuration. Understanding this process is of great practical importance since some devastating diseases such as Alzheimer\'s and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. In our group, we have developed a new computational technique for studying protein folding that is based on probabilistic roadmap methods for motion planning. Our technique yields an approximate map of a protein\'s potential energy landscape that contains thousands of feasible folding pathways. We have validated our method against known experimental results. Other simulation techniques, such as molecular dynamics or Monte Carlo methods, require many orders of magnitude more time to produce a single, partial, trajectory. We report on our experiences parallelizing our method using STAPL (the standard template adaptive parallel library), that is being developed in the Parasol Lab at Texas A&M. An efficient parallel version enables us to study larger proteins with increased accuracy. We demonstrate how STAPL enables portable efficiency across multiple platforms without user code modification. We show performance gains on two systems: a dedicated Linux cluster and an extremely heterogeneous multiuser Linux cluster.


Enveloping multi-pocket obstacles with hexagonal metamorphic robots, Jennifer E. Walter, Mary E. Brooks, David F. Little, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 3, pp. 2204-2209, New Orleans, Louisiana, USA, Apr 2004. DOI: 10.1109/ROBOT.2004.1307389
Keywords: Mobile Robots, Motion Planning, Multi-Agent
Links : [Published]

BibTex

@INPROCEEDINGS{1307389,
author={J. E. {Walter} and M. E. {Brooks} and D. F. {Little} and N. M. {Amato}},
booktitle={IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004}, title={Enveloping multi-pocket obstacles with hexagonal metamorphic robots},
year={2004},
volume={3},
number={},
pages={2204-2209 Vol.3},
doi={10.1109/ROBOT.2004.1307389}}


Abstract

The problem addressed is reconfiguration planning for a metamorphic robotic system composed of any number of hexagonal robots when a single obstacle with multiple indentations or "pockets" is embedded in the goal environment. We extend our earlier work on filling a single pocket in an obstacle to the case where the obstacle surface may contain multiple pockets. The planning phase of our algorithm first determines whether the obstacle pockets provide sufficient clearance for module movement, i.e., whether the obstacle is "admissible". In this paper, we present algorithms that sequentially order individual pockets and order module placement inside each pocket. These algorithms ensure that every cell in each pocket is filled and that module deadlock and collision do not occur during reconfiguration. This paper also provides a complete overview of the planning stage that is executed prior to reconfiguration and presents a distributed reconfiguration schema for filling more than one obstacle pocket concurrently, followed by the envelopment of the entire obstacle. Lastly, we present examples of obstacles with multiple pockets that were successfully filled using our distributed reconfiguration simulator.


Shepherding Behaviors, Jyh-Ming Lien, O. Burchan Bayazit, Ross T. Sowell, Samuel Rodriguez, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 4, pp. 4159-4164, New Orleans, Louisiana, USA, Apr 2004. DOI: 10.1109/ROBOT.2004.1308924
Keywords: Group Behavior, Multi-Agent, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1308924,
author={ {Jyh-Ming Lien} and O. B. {Bayazit} and R. T. {Sowell} and S. {Rodriguez} and N. M. {Amato}},
booktitle={IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004}, title={Shepherding behaviors},
year={2004},
volume={4},
number={},
pages={4159-4164 Vol.4},
doi={10.1109/ROBOT.2004.1308924}}


Abstract

Shepherding behaviors are a type of flocking behavior in which outside agents guide or control members of a flock. Shepherding behaviors can be found in various forms in nature. For example, herding, covering, patrolling and collecting are common types of shepherding behaviors. In this work, we investigate ways to simulate these types of behaviors. A shepherd uses roadmaps to steer the flock and to re-group separated flock members. This paper focuses on improving the shepherd's movements to gain better control of the flock's motion and use this improved control to demonstrate a wider variety of shepherding behaviors.


Using Motion Planning to Study RNA Folding Kinetics, Xinyu Tang, Bonnie Kirkpatrick, Shawna Thomas, Guang Song, Nancy M. Amato, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 252-261, Mar 2004. DOI: 10.1145/974614.974648
Keywords: Computational Biology, RNA, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{10.1145/974614.974648,
author = {Tang, Xinyu and Kirkpatrick, Bonnie and Thomas, Shawna and Song, Guang and Amato, Nancy M.},
title = {Using Motion Planning to Study RNA Folding Kinetics},
year = {2004},
isbn = {1581137559},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/974614.974648},
doi = {10.1145/974614.974648},
abstract = {We propose a novel, motion planning based approach to approximately map the energy landscape of an RNA molecule. Our method is based on the successful probabilistic roadmap motion planners that we have previously successfully applied to protein folding. The key advantage of our method is that it provides a sparse map that captures the main features of the landscape and which can be analyzed to compute folding kinetics. In this paper, we provide evidence that this approach is also well suited to RNA. We compute population kinetics and transition rates on our roadmaps using the master equation for a few moderately sized RNA and show that our results compare favorably with results of other existing methods.},
booktitle = {Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology},
pages = {252–261},
numpages = {10},
keywords = {motion planning, RNA, folding kinetics},
location = {San Diego, California, USA},
series = {RECOMB \'04}
}


Abstract

We propose a novel, motion planning based approach to approximately map the energy landscape of an RNA molecule. Our method is based on the successful probabilistic roadmap motion planners that we have previously successfully applied to protein folding. The key advantage of our method is that it provides a sparse map that captures the main features of the landscape and which can be analyzed to compute folding kinetics. In this paper, we provide evidence that this approach is also well suited to RNA. We compute population kinetics and transition rates on our roadmaps using the master equation for a few moderately sized RNA and show that our results compare favorably with results of other existing methods.


An Adaptive Algorithm Selection Framework, Hao Yu, Dongmin Zhang, Francis Dang, Lawrence Rauchwerger, Technical Report, TR04-002, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, Mar 2004.
Keywords: Adaptive Algorithm, Machine Learning
Links : [Manuscript]

BibTex

@techreport{hyu-tr04-002-2004,
title = "An Adaptive Algorithm Selection Framework",
author = " Yu, Hao and Zhang, Dongmin and Dang, Francis and Rauchwerger, Lawrence",
institution = "Parasol Laboratory, Department of Computer Science, Texas A&M University",
address = "College Station, TX",
number = "TR04-002",
year = 2004,
month = mar
}


A general framework for sampling on the medial axis of the free space, Jyh-Ming Lien, Shawna L. Thomas, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 3, pp. 4439-4444, Taipei, Taiwan, Sep 2003. DOI: 10.1109/ROBOT.2003.1242288
Keywords: Medial Axis, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1242288,
author={ {Jyh-Ming Lien} and S. L. {Thomas} and N. M. {Amato}},
booktitle={2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)}, title={A general framework for sampling on the medial axis of the free space},
year={2003},
volume={3},
number={},
pages={4439-4444 vol.3},
doi={10.1109/ROBOT.2003.1242288}}


Abstract

We propose a general framework for sampling the configuration space in which randomly generated configurations, free or not, are retracted onto the medial axis of the free space. Generalizing our previous work, this framework provides a template encompassing all possible retraction approaches. It also removes the requirement of exactly computing distance metrics thereby enabling application to more realistic high dimensional problems. In particular, our framework supports methods that retract a given configuration exactly or approximately onto the medial axis. As in our previous work, exact methods provide fast and accurate retraction in low (2 or 3) dimensional space. We also propose new approximate methods that can be applied to high dimensional problems, such as many DOF articulated robots. Theoretical and experimental results show improved performance on problems requiring traversal of narrow passages. We also study tradeoffs between accuracy and efficiency for different levels of approximation, and how the level of approximation effects the quality of the resulting roadmap.


Extracting Optimal Paths from Roadmaps for Motion Planning, Jinsuck Kim, Roger A. Pearce, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 2, pp. 2424-2429, Taipei, Taiwan, Sep 2003. DOI: 10.1109/ROBOT.2003.1241956
Keywords: Path Planning, Query Customization
Links : [Published]

BibTex

@INPROCEEDINGS{1241956,
author={ {Jinsuck Kim} and R. A. {Pearce} and N. M. {Amato}},
booktitle={2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)}, title={Extracting optimal paths from roadmaps for motion planning},
year={2003},
volume={2},
number={},
pages={2424-2429 vol.2},
doi={10.1109/ROBOT.2003.1241956}}


Abstract

We present methods for extracting optimal paths from motion planning roadmaps. Our system enables any combination of optimization criteria, such as collision detection, kinematic/dynamic constraints, or minimum clearance, and relaxed definitions of the goal state, to be used when selecting paths from roadmaps. Our algorithm is an augmented version of Dijkstra's shortest path algorithm which allows edge weights to be defined relative to the current path. We present simulation results maximizing minimum path clearance, minimizing localization effort, and enforcing kinematic/dynamic constraints.


Feature-Based Localization using Scannable Visibility Sectors, Jinsuck Kim, Roger A. Pearce, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 2, pp. 2854-2859, Taipei, Taiwan, Sep 2003. DOI: 10.1109/ROBOT.2003.1242025
Keywords: Mobile Robots, Navigation, Sensor Systems
Links : [Published]

BibTex

@INPROCEEDINGS{1242025,
author={ {Jinsuck Kim} and R. A. {Pearce} and N. M. {Amato}},
booktitle={2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)}, title={Feature-based localization using scannable visibility sectors},
year={2003},
volume={2},
number={},
pages={2854-2859 vol.2},
doi={10.1109/ROBOT.2003.1242025}}


Abstract

This paper presents methods for navigating and localizing mobile robots in a known indoor environment. We introduce a restricted visibility concept called a scannable sector that can aid many existing navigation and localization algorithms. The scannable sectors are based on the physical characteristics of the environment and the limitations of the localization sensors used. We describe a complete navigation system that includes a scannable sector based localizer, sonar sensors, and a probabilistic roadmap path planner. Simulation and hardware results using a real robot with sonar sensors show the potential of our approach.


Improving the Connectivitiy of PRM Roadmaps, Marco Morales, Samuel Rodriguez, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 3, pp. 4427-4432, Taipei, Taiwan, Sep 2003. DOI: 10.1109/ROBOT.2003.1242286
Keywords: Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1242286,
author={M. {Morales} and S. {Rodriguez} and N. M. {Amato}},
booktitle={2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)}, title={Improving the connectivity of PRM roadmaps},
year={2003},
volume={3},
number={},
pages={4427-4432 vol.3},
doi={10.1109/ROBOT.2003.1242286}}


Abstract

In this paper we investigate how the coverage and connectedness of PRM roadmaps can be improved by adding a connected component (CC) connection step to the general PRM framework. We provide experimental results establishing that significant roadmap improvements can be obtained relatively efficiently by utilizing a suite of CC connection methods, which include variants of existing methods such as RRT and a new ray tracing based method. The coordinated application of these techniques is enabled by methods for selecting and scheduling pairs of nodes in different CCs for connection attempts. In addition to identifying important and/or promising regions of C-space for exploration, these methods also provide a mechanism for controlling the cost of the connection attempts. In our experiments, the time required by the improvement phase was on the same order as the time used to generate the initial roadmap.


A General Framework for PRM Motion Planning, Guang Song, Shawna Thomas, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 3, pp. 4445-4450, Taipei, Taiwan, Sep 2003. DOI: 10.1109/ROBOT.2003.1242289
Keywords: Lazy Evaluation, Query Customization, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{1242289,
author={ {Guang Song} and S. {Thomas} and N. M. {Amato}},
booktitle={2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)}, title={A general framework for PRM motion planning},
year={2003},
volume={3},
number={},
pages={4445-4450 vol.3},
doi={10.1109/ROBOT.2003.1242289}}


Abstract

An important property of PRM roadmaps is that they provide a good approximation of the connectivity of the free C-space. We present a general framework for building and querying probabilistic roadmaps that includes all previous PRM variants as special cases. In particular, it supports no, complete, or partial node and edge validation and various evaluation schedules for path validation, and it enables path customization for variable, adaptive query requirements. While each of the above features is present in some PRM variant, the general framework proposed here is the only one to include them all. Our framework enables users to choose the best approximation level for their problem. Our experimental evidence shows this can result in significant performance gains.


A Performance Model of non-Deterministic Particle Transport on Large-Scale Systems, Mark M. Mathis, Darren J. Kerbyson, Adolfy Hoisie, Computational Science (ICCS 2003), Lecture Notes in Computer Science, Vol: 2659, Melbourne, Australia, Jun 2003. DOI: 10.1007/3-540-44863-2_89
Keywords: High-Performance Computing, Parallel Algorithms
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-44863-2_89,
author="Mathis, Mark M.
and Kerbyson, Darren J.
and Hoisie, Adolfy",
editor="Sloot, Peter M. A.
and Abramson, David
and Bogdanov, Alexander V.
and Gorbachev, Yuriy E.
and Dongarra, Jack J.
and Zomaya, Albert Y.",
title="A Performance Model of Non-deterministic Particle Transport on Large-Scale Systems",
booktitle="Computational Science --- ICCS 2003",
year="2003",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="905--915",
abstract="In this work we present a predictive analytical model that encompasses the performance and scaling characteristics of a non-deterministic particle transport application, MCNP. Previous studies on the scalability of parallel Monte Carlo eigenvalue calculations have been rather general in nature [[1]]. It can be used for the simulation of neutron, photon, electron, or coupled transport, and has found uses in many problem areas. The performance model is validated against measurements on an AlphaServer ES40 system showing high accuracy across many processor / problem combinations. It is parametric with both application characteristics (e.g. problem size), and system characteristics (e.g. communication latency, bandwidth, achieved processing rate) serving as input. The model is used to provide insight into the achievable performance that should be possible on systems containing thousands of processors and to quantify the impact that possible improvements in sub-system performance may have. In addition, the impact on performance of modifying the communication structure of the code is also quantified.",
isbn="978-3-540-44863-1"
}


Abstract

In this work we present a predictive analytical model that encompasses the performance and scaling characteristics of a non-deterministic particle transport application, MCNP. Previous studies on the scalability of parallel Monte Carlo eigenvalue calculations have been rather general in nature [[1]]. It can be used for the simulation of neutron, photon, electron, or coupled transport, and has found uses in many problem areas. The performance model is validated against measurements on an AlphaServer ES40 system showing high accuracy across many processor / problem combinations. It is parametric with both application characteristics (e.g. problem size), and system characteristics (e.g. communication latency, bandwidth, achieved processing rate) serving as input. The model is used to provide insight into the achievable performance that should be possible on systems containing thousands of processors and to quantify the impact that possible improvements in sub-system performance may have. In addition, the impact on performance of modifying the communication structure of the code is also quantified.


ARMI: An Adaptive, Platform Independent Communication Library, Steven Saunders, Lawrence Rauchwerger, Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 230-241, San Diego, California, USA, Jun 2003. DOI: 10.1145/781498.781534
Keywords: Communication Library, Remote Method Invocation, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/781498.781534,
author = {Saunders, Steven and Rauchwerger, Lawrence},
title = {ARMI: An Adaptive, Platform Independent Communication Library},
year = {2003},
isbn = {1581135882},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/781498.781534},
doi = {10.1145/781498.781534},
abstract = {ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer. We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.},
booktitle = {Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {230–241},
numpages = {12},
keywords = {MPI, parallel programming, RPC, Pthreads, communication library, OpenMP, run-time system, RMI},
location = {San Diego, California, USA},
series = {PPoPP \'03}
}

@article{10.1145/966049.781534,
author = {Saunders, Steven and Rauchwerger, Lawrence},
title = {ARMI: An Adaptive, Platform Independent Communication Library},
year = {2003},
issue_date = {October 2003},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {38},
number = {10},
issn = {0362-1340},
url = {https://doi.org/10.1145/966049.781534},
doi = {10.1145/966049.781534},
abstract = {ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer. We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.},
journal = {SIGPLAN Not.},
month = jun,
pages = {230–241},
numpages = {12},
keywords = {RPC, RMI, Pthreads, run-time system, OpenMP, parallel programming, MPI, communication library}
}


Abstract

ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer. We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.


Neuron PRM: A Framework for Constructing Cortical Networks, Jyh-Ming Lien, Marco Morales, Nancy M. Amato, Neurocomputing, Vol: 54, pp. 191-197, Jun 2003. DOI: 10.1016/S0925-2312(02)00728-2
Keywords: Machine Learning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{LIEN2003191,
title = "Neuron PRM: a framework for constructing cortical networks",
journal = "Neurocomputing",
volume = "52-54",
pages = "191 - 197",
year = "2003",
note = "Computational Neuroscience: Trends in Research 2003",
issn = "0925-2312",
doi = "https://doi.org/10.1016/S0925-2312(02)00728-2",
url = "http://www.sciencedirect.com/science/article/pii/S0925231202007282",
author = "Jyh-Ming Lien and Marco Morales and Nancy M. Amato",
keywords = "Cortical networks, PRM, BTS, L-system, Rectangle tree",
abstract = "The brain's extraordinary computational power to represent and interpret complex natural environments is essentially determined by the topology and geometry of the brain's architectures. We present a framework to construct cortical networks which borrows from probabilistic roadmap methods developed for robotic motion planning. We abstract the network as a large-scale directed graph, and use L-systems and statistical data to ‘grow’ neurons that are morphologically indistinguishable from real neurons. We detect connections (synapses) between neurons using geometric proximity tests."
}


Abstract

The brain's extraordinary computational power to represent and interpret complex natural environments is essentially determined by the topology and geometry of the brain's architectures. We present a framework to construct cortical networks which borrows from probabilistic roadmap methods developed for robotic motion planning. We abstract the network as a large-scale directed graph, and use L-systems and statistical data to ‘grow’ neurons that are morphologically indistinguishable from real neurons. We detect connections (synapses) between neurons using geometric proximity tests.


Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures, Nancy M. Amato , Ken A. Dill , Guang Song, Journal of Computational Biology, Vol: 10, Issue: 3-4, pp. 239-255, Jun 2003. DOI: 10.1089/10665270360688002
Keywords: Protein Folding
Links : [Published]

BibTex

@article{amato2003using,
title={Using motion planning to map protein folding landscapes and analyze folding kinetics of known native structures},
author={Amato, Nancy M and Dill, Ken A and Song, Guang},
journal={Journal of computational biology},
volume={10},
number={3-4},
pages={239--255},
year={2003},
publisher={Mary Ann Liebert, Inc.}
}


Abstract

We investigate a novel approach for studying the kinetics of protein folding. Our framework has evolved from robotics motion planning techniques called probabilistic roadmap methods (PRMs) that have been applied in many diverse fields with great success. In our previous work, we presented our PRM-based technique and obtained encouraging results studying protein folding pathways for several small proteins. In this paper, we describe how our motion planning framework can be used to study protein folding kinetics. In particular, we present a refined version of our PRM-based framework and describe how it can be used to produce potential energy landscapes, free energy landscapes, and many folding pathways all from a single roadmap which is computed in a few hours on a desktop PC. Results are presented for 14 proteins. Our ability to produce large sets of unrelated folding pathways may potentially provide crucial insight into some aspects of folding kinetics, such as proteins that exhibit both two-state and three-state kinetics that are not captured by other theoretical techniques.


Object Oriented Abstractions for Communication in Parallel Programs, Steven Saunders, Masters Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, May 2003.
Keywords: Communication Library, Runtime Systems, STAPL
Links : [Published]

BibTex

N/A


Abstract

This thesis details ARMI, a parallel communication library that provides an advanced implementation of the remote method invocation protocol (RMI), which is well suited to object-oriented programs. ARMI is a framework for expressing fine-grain parallelism in C++, and mapping it to a particular machine using shared-memory and message passing library calls. It handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. These details can be adapted for different platforms to allow user codes to achieve the highest possible performance without manual modification. ARMI is used by STAPL, a generic, object-oriented, parallel C++ library, to provide a portable communication infrastructure. The basic design decisions are described, as well as a detailed analysis of the mechanisms used in the current OpenMP/Pthreads, MPI, and mixed-mode implementations. ARMI’s performance is compared with both hand-coded Pthreads and MPI on a variety of machines, including an HP-V2200, Origin 3800, IBM Regatta and IBM RS6000 SP.


Enveloping Obstacles with Hexagonal Metamorphic Robots, Jennifer E. Walter, Mary E. Brooks, David F. Little, Nancy M. Amato, Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 1, pp. 741-748, Taipei, Taiwan, Jan 2003. DOI: 10.1109/ROBOT.2003.1241682
Keywords: Mobile Robots, Motion Planning, Multi-Agent
Links : [Published]

BibTex

@INPROCEEDINGS{1241682,
author={J. E. {Walter} and E. M. {Tsai} and N. M. {Amato}},
booktitle={2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422)}, title={Enveloping obstacles with hexagonal metamorphic robots},
year={2003},
volume={1},
number={},
pages={741-748 vol.1},
doi={10.1109/ROBOT.2003.1241682}}


Abstract

The problem addressed is the distributed reconfiguration of the metamorphic robot system composed of any number of two dimensional robots (modules). The initial configuration we consider is a straight chain of modules, while the goal configuration satisfies a simple admissibility condition. Our reconfiguration strategy depends on finding a contiguous path of cells, called a substrate path that spans the goal configuration. Modules fill in this substrate path and then move along the path to fill in the remainder of the goal without collision or deadlock. In this paper, we address the problem of reconfiguration when a single obstacle is embedded in the goal environment. We introduce a classification for traversable surfaces, which allows for coherence in defining admissibility characteristics for various objects in the hexagonal grid. We present algorithms to 1) determine if an obstacle embedded in the goal fulfills a simple admissibility requirement, 2) include an admissible obstacle in a substrate path, and 3) accomplish distributed reconfiguration.


A Path Planning-based Study of Protein Folding With a Case Study of Hairpin Formation in Protein G and L, Guang Song, Shawna Thomas, Ken A. Dill, J. Martin Scholtz, Nancy M. Amato, In Proc. Pac. Symp. of Biocomputing (PSB), Vol: 8, pp. 240-251, Lihue, Hawaii, Jan 2003.
Keywords: Computational Biology, Protein Folding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{Song_2003,
author={Guang {Song}, Shawna {Thomas}, Ken A. {Dill}, J. Martin {Scholtz}, Nancy M. {Amato}},
booktitle={In Proc. Pac. Symp. of Biocomputing (PSB)},
title={A Path Planning-based Study of Protein Folding With a Case Study of Hairpin Formation in Protein G and L},
year={2003}, volume={8}, number={}, pages={240-251},
doi={}
}


Abstract

We investigate a novel approach for studying protein folding that has evolvedfrom robotics motion planning techniques calledprobabilistic roadmapmethods(prms). Our focus is to study issues related to the folding process, such as theformation of secondary and tertiary structure,assumingweknowthenativefold.A feature of ourprm-based framework is that the large sets of folding pathwaysin the roadmaps it produces, in a few hours on a desktop PC, provide globalinformation about the protein’s energy landscape. This is an advantage over othersimulation methods such as molecular dynamics or Monte Carlo methods whichrequire more computation and produce only a single trajectory in each run. Inour initial studies, we obtained encouraging results for several small proteins. Inthis paper, we investigate more sophisticated techniques for analyzing the foldingpathways in our roadmaps. In addition to more formally revalidating our previousresults, we present a case study showing our technique captures known foldingdifferences between the structurally similar proteins G and L.


Better Group Behaviors Using Rule-Based Roadmaps, Osman Burçhan Bayazit, Jyh-Ming Lien, Nancy M. Amato, In Proc. Int. Wkshp. on Alg. Found. of Rob. (WAFR), Vol: 7, pp. 95-111, Nice, France, Dec 2002. DOI: 10.1007/978-3-540-45058-0_7
Keywords: Group Behavior, Multi-Agent, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{10.1007/978-3-540-45058-0_7,
author={Osman Burçhan {Bayazit}, Jyh-Ming {Lien}, Nancy M. {Amato}},
booktitle={In Proc. Int. Wkshp. on Alg. Found. of Rob. (WAFR)},
title={Better Group Behaviors Using Rule-Based Roadmaps},
year={2002}, volume={7}, number={}, pages={95-111},
doi={10.1007/978-3-540-45058-0_7}}


Abstract

While techniques exist for simulating group behaviors, these methods usually only provide simplistic navigation and planning capabilities. In this work, we explore the benefits of integrating roadmap-based path planning methods with flocking techniques. We show how group behaviors such as exploring can be facilitated by using dynamic roadmaps (e.g., modifying edge weights) as an implicit means of communication between flock members. Extending ideas from cognitive modeling, we embed behavior rules in individual flock members and in the roadmap. These behavior rules enable the flock members to modify their actions based on their current location and state. We propose new techniques for three distinct group behaviors: homing, exploring (covering and goal searching) and passing through narrow areas. Animations of these behaviors can be viewed at http://parasol.tamu.edu/dsmft.


A parallel communication infrastructure for STAPL, Steven Saunders, Lawrence Rauchwerger, Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL), New York, NY, USA, Jun 2002.
Keywords: Communication Library, Runtime Systems, STAPL
Links : [Published] [Talk]

BibTex

N/A


Abstract

Communication is an important but difficult aspect of parallel programming. This paper describes a parallel communication infrastructure, based on remote method invocation, to simplify parallel programming by abstracting lowlevel shared-memory or message passing details while maintaining high performance and portability. STAPL, the Standard Template Adaptive Parallel Library, builds upon this infrastructure to make communication transparent to the user. The basic design is discussed, as well as the mechanisms used in the current Pthreads and MPI implementations. Performance comparisons between STAPL and explicit Pthreads or MPI are given on a variety of machines, including an HPV2200, Origin 3800 and a Linux Cluster.


The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops, Francis Dang, Hao Yu, Lawrence Rauchwerger, Proceedings 16th International Parallel and Distributed Processing Symposium (IPDPS), pp. 10-, Ft. Lauderdale, FL, USA, Apr 2002. DOI: 10.1109/IPDPS.2002.1015493
Keywords: Parallel Processing, Parallel Programming, Parallelising Compilers
Links : [Published]

BibTex

@INPROCEEDINGS{1015493, author={F. {Dang} and {Hao Yu} and L. {Rauchwerger}}, booktitle={Proceedings 16th International Parallel and Distributed Processing Symposium}, title={The R-LRPD test: speculative parallelization of partially parallel loops}, year={2002}, volume={}, number={}, pages={10 pp-}, doi={10.1109/IPDPS.2002.1015493}}


Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previously proposed framework we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; if the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We now propose a generalization of our speculative doall parallelization technique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the runtime dependence test itself. We present the base algorithm and an analysis of the different heuristics for its practical application and a few experimental results on loops from Track, Spice, and FMA3D.


SmartApps, An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations, Francis Dang, Maria Jesus Garzaran, Milos Prvulovic, Ye Zhang, Alin Jula, Hao Yu, Nancy Amato, Lawrence Rauchwerger, Josep Torrellas, Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS), pp. 172-181, Fort Lauderdale, Florida, USA, Apr 2002. DOI: 10.1109/IPDPS.2002.1016572
Keywords: Computer Architecture, High-Performance Computing, Optimizing Compilers
Links : [Published]

BibTex

@INPROCEEDINGS{1016572, author={F. {Dang} and M. {Jesus Garzaran} and M. {Prvulovic} and {Ye Zhang} and A. {Jula} and {Hao Yu} and N. {Amato} and L. {Rauchwerger} and J. {Torrellas}}, booktitle={Proceedings 16th International Parallel and Distributed Processing Symposium}, title={Smartapps, an application centric approach to high performance computing: compiler-assisted software and hardware support for reduction operations}, year={2002}, volume={}, number={}, pages={10 pp-}, doi={10.1109/IPDPS.2002.1016572}}


Abstract

State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SMARTAPP performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of SMARTAPPS and then present some achievements to date, focusing on compiler-assisted software and hardware techniques for parallelizing reduction operations. These illustrate SMARTAPPS use of adaptive algorithm selection and moderately reconfigurable hardware.


Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors, Maria Jesus Garzaran, Milos Prvulovic, Ye Zhang, Alin Jula, Hao Yu, Lawrence Rauchwerger, Josep Torrellas, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 243-254, Barcelona, Spain, Spain, Sep 2001. DOI: 10.1109/PACT.2001.953304
Keywords: Computer Architecture, High-Performance Computing, Optimizing Compilers
Links : [Published]

BibTex

@INPROCEEDINGS{953304, author={M. J. {Garzaran} and M. {Prvulovic} and {Ye Zhang} and A. {Jula} and {Hao Yu} and L. {Rauchwerger} and J. {Torrellas}}, booktitle={Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques}, title={Architectural support for parallel reductions in scalable shared-memory multiprocessors}, year={2001}, volume={}, number={}, pages={243-254}, doi={10.1109/PACT.2001.953304}}


Abstract

Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds up parallel reduction and makes it scalable in shared-memory multiprocessors. The required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.


STAPL: An Adaptive, Generic Parallel C++ Library, Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy Amato, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science, Vol: 2624, pp. 193-208, Cumberland Falls, KY, USA, Aug 2001. DOI: 10.1007/3-540-35767-X_13
Keywords: High-Performance Computing, Parallel Programming, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-35767-X_13,
author="An, Ping
and Jula, Alin
and Rus, Silvius
and Saunders, Steven
and Smith, Tim
and Tanase, Gabriel
and Thomas, Nathan
and Amato, Nancy
and Rauchwerger, Lawrence",
editor="Dietz, Henry G.",
title="STAPL: An Adaptive, Generic Parallel C++ Library",
booktitle="Languages and Compilers for Parallel Computing",
year="2003",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="193--208",
abstract="The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism for applications which use dynamically linked data structures such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms. STAPL provides several different algorithms for some library routines, and selects among them adaptively at runtime. STAPL can replace STL automatically by invoking a preprocessing translation phase. In the applications studied, the performance of translated code was within 5{\%} of the results obtained using STAPL directly. STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.",
isbn="978-3-540-35767-4"
}


Abstract

The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism for applications which use dynamically linked data structures such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms. STAPL provides several different algorithms for some library routines, and selects among them adaptively at runtime. STAPL can replace STL automatically by invoking a preprocessing translation phase. In the applications studied, the performance of translated code was within 5% of the results obtained using STAPL directly. STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.


Finding Strongly Connected Components in Parallel in Particle Transport Sweeps, William McLendon, Bruce Hendrickson, Steve Plimpton, Lawrence Rauchwerger, Proceedings of the Thirteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 328-329, Crete Island, Greece, Jul 2001. DOI: 10.1145/378580.378751
Keywords: High-Performance Computing, Parallel Algorithms
Links : [Published]

BibTex

@inproceedings{10.1145/378580.378751,
author = {Mclendon, William and Hendrickson, Bruce and Plimpton, Steve and Rauchwerger, Lawrence},
title = {Finding Strongly Connected Components in Parallel in Particle Transport Sweeps},
year = {2001},
isbn = {1581134096},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/378580.378751},
doi = {10.1145/378580.378751},
abstract = {Discrete ordinates methods are commonly used to simulate radiation transport for fire or weapons modeling. The computation proceeds by sweeping the flux across a grid. A particular cell cannot be computed until all the cells immediately upwind of it are finished. If the directed dependence graph for the grid cells contains a cycle then sweeping methods will deadlock. This can happen in unstructured grids and time stepped problems where the grid is allowed to deform. In this paper we present a parallel algorithm to detect cycles in the dependence graphs present in these grids as well as an implementation and experimental results on shared and distributed memory machines.},
booktitle = {Proceedings of the Thirteenth Annual ACM Symposium on Parallel Algorithms and Architectures},
pages = {328–329},
numpages = {2},
location = {Crete Island, Greece},
series = {SPAA '01}
}


Abstract

Discrete ordinates methods are commonly used to simulate radiation transport for fire or weapons modeling. The computation proceeds by sweeping the flux across a grid. A particular cell cannot be computed until all the cells immediately upwind of it are finished. If the directed dependence graph for the grid cells contains a cycle then sweeping methods will deadlock. This can happen in unstructured grids and time stepped problems where the grid is allowed to deform. In this paper we present a parallel algorithm to detect cycles in the dependence graphs present in these grids as well as an implementation and experimental results on shared and distributed memory machines.


STAPL: A standard template adaptive parallel C++ library, Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy Amato, Lawrence Rauchwerger, Int. Wkshp on Adv. Compiler Technology for High Perf. and Embedded Processors, pp. 10-, Bucharest, Romania, Jul 2001.
Keywords: C++, Parallel Libraries, STAPL
Links : [Published]

BibTex

N/A


Abstract

The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni-or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism in areas such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms, which use dynamically linked data structures. STAPL provides several different algorithms for some library routines, and selects among them adaptively at run-time. STAPL can replace STL automatically by invoking a preprocessing translation phase. The performance of translated code is close to the results obtained using STAPL directly (less than 5% performance deterioration). However, STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.


A motion planning approach to folding: from paper craft to protein folding, Guang Song, Nancy M. Amato, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Vol: 1, pp. 948-953, Seoul, Korea, May 2001. DOI: 10.1109/ROBOT.2001.932672
Keywords: Path Planning, Protein Folding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{932672,
author={ {Guang Song} and N. M. {Amato}},
booktitle={Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)}, title={A motion planning approach to folding: from paper craft to protein folding},
year={2001},
volume={1},
number={},
pages={948-953 vol.1},
doi={10.1109/ROBOT.2001.932672}}


Abstract

We present a framework for studying folding problems from a motion planning perspective. Modeling foldable objects as tree-like multi-link objects allows one to apply motion planning techniques to folding problems. An important feature of this approach is that it not only allows one to study foldability questions, such as, can an object be folded (or unfolded) into another object, but also provides one with another tool for investigating the dynamic folding process itself. The framework proposed here has application to traditional motion planning areas such as automation and animation, and presents a novel approach for studying protein folding pathways. Preliminary experimental results with traditional paper crafts (e.g., box folding) and small proteins (approximately 60 residues) are quite encouraging.


Customizing PRM Roadmaps at Query Time, Guang Song, S. Miller, N.M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Seoul, South Korea, May 2001. DOI: 10.1109/ROBOT.2001.932823
Keywords: Query Customization, Sampling-Based Motion Planning
Links : [Published]

BibTex

@inproceedings{song2001customizing,
title={Customizing PRM roadmaps at query time},
author={Song, Guang and Miller, Shawna and Amato, Nancy M},
booktitle={Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164)},
volume={2},
pages={1500--1505},
year={2001},
organization={IEEE}
}


Abstract

We propose an approach for building and querying probabilistic roadmaps. In the roadmap construction stage, we build coarse roadmaps by performing only an approximate validation of the roadmap nodes and/or edges. In the query stage, the roadmap is validated and refined only in the area of interest for the query, and moreover is customized in accordance with any specified query preferences. This approach, which postpones some of the validation checks (e.g., collision checks) to the query phase, yields more efficient solutions to many problems. An important benefit of our approach is that it gives one the ability to customize the same roadmap in accordance with multiple, variable, query preferences. For example our approach enables one to find a path which maintains a particular clearance, or makes at most some specified number of sharp turns. Our preliminary results on problems drawn from diverse application domains show that this new approach dramatically improves performance, and shows remarkable flexibility when adapting to different query requirements.


An Integrated Mobile Robot Path (Re)Planner and Localizer for Personal Robots, Jinsuck Kim, Nancy Amato, Sooyong Lee, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Seoul, South Korea, May 2001. DOI: 10.1109/ROBOT.2001.933208
Keywords: Mobile Robots, Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{933208, author={ {Jinsuck Kim} and N. M. {Amato} and {Sooyong Lee}}, booktitle={Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)}, title={An integrated mobile robot path (re)planner and localizer for personal robots}, year={2001}, volume={4}, number={}, pages={3789-3794 vol.4}, doi={10.1109/ROBOT.2001.933208}}


Abstract

We describe a method for navigation in a known indoor environment, such as a home or office, that requires only inexpensive range sensors. Our framework includes a high-level planner which integrates and coordinates path planning and localization modules with the aid of a module for computing regions which are expected, with high probability, to contain the robot at any given time. The localization method is based on simple geometric properties of the environment which are computed during a preprocessing stage. The roadmap-based path planner enables one to select routes, and subgoals along those routes, that will facilitate localization and other optimization criteria. In addition, our framework enables one to quickly plan new routes, dynamically, based on the current position as computed by intermediate localization operations. We present simulation and hardware experimental results that illustrate the practicality and potential of our approach.


Hybrid Dynamic Simulation of Rigid-Body Contact with Coulomb Friction, Wookho Son, J.C. Trinkle, Nancy Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Seoul, South Korea, May 2001. DOI: 10.1109/ROBOT.2001.932802
Keywords: Kinodynamic Planning, Robotics
Links : [Published]

BibTex

@inproceedings{son2001hybrid,
title={Hybrid dynamic simulation of rigid-body contact with Coulomb friction},
author={Son, Wookho and Trinkle, Jeffrey C and Amato, Nancy M},
booktitle={Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164)},
volume={2},
pages={1376--1381},
year={2001},
organization={IEEE}
}


Disassembly Sequencing Using a Motion Planning Approach, Sujay Sundaram, Ian Remmler, Nancy Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Seoul, South Korea, May 2001. DOI: 10.1109/ROBOT.2001.932818
Keywords: Disassembly Planning, Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{932818, author={S. {Sundaram} and I. {Remmler} and N. M. {Amato}}, booktitle={Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)}, title={Disassembly sequencing using a motion planning approach}, year={2001}, volume={2}, number={}, pages={1475-1480 vol.2}, doi={10.1109/ROBOT.2001.932818}}


Abstract

Our motion planning based approach treats the parts in the assembly as robots and operates in the composite configuration space of the parts' individual configuration spaces. Randomized techniques inspired by recent motion planning methods are used to sample configurations in this space. Since typical assemblies consist of many parts, the corresponding composite C-spaces have high dimensionality. Also, since many important configurations for the disassembly sequence will involve closely packed parts, the disassembly problem suffers from the so-called narrow passage problem. We bias the sampling by computing potential movement directions based on the geometric characteristics of configurations known to be reachable from the assembled configuration. We construct a disassembly tree which is rooted at the starting assembled configuration. Our experimental results with several non-trivial puzzle-like assemblies show the potential of this approach.


Probabilistic Roadmaps-Putting It All Together, Lucia K. Dale, Nancy M. Amato, In Proc. IEEE International Conference on Robotics and Automation (ICRA), Seoul, South Korea, May 2001. DOI: 10.1109/ROBOT.2001.932892
Keywords: Sampling-Based Motion Planning
Links : [Published]

BibTex

@INPROCEEDINGS{932892, author={L. K. {Dale} and N. M. {Amato}}, booktitle={Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)}, title={Probabilistic roadmaps-putting it all together}, year={2001}, volume={2}, number={}, pages={1940-1947 vol.2}, doi={10.1109/ROBOT.2001.932892}}


Abstract

Given a robot and a workspace, probabilistic roadmap planners (PRMs) build a roadmap of paths sampled from the workspace. A roadmap node is a single collision-free robot configuration, randomly generated. A roadmap edge is a sequence of collision-free robot configurations which interpolate the path from one roadmap node to another. Queries to the roadmap are (start, goal) pairs. If both the start and goal of a pair can be connected to the same connected component of the roadmap, the query is solved. Many promising variants of the PRM have been proposed, each with their own strengths and weaknesses. We propose a meta-planner for using many PRMs in such a way that the strengths are combined and the weaknesses offset. Our meta-planner will perform the combination in the following manner: i) provide a framework in which different motion planners are available and to which new ones are easily added; ii) characterize subregions (possibly overlapping) based on sample characteristics and connection results; iii) assign subregions to one or more planners which are judged promising; and iv) provide stopping criteria for roadmap construction. We present experimental results for four characterization measures. A general technique we call 'filtering' is presented for keeping roadmaps compact.


Enhancing Randomized Motion Planners: Exploring with Haptic Hints, O. Burçhan Bayazit, Guang Song, Nancy M. Amato , Autonomous Robots, Vol: 10, Issue: 2, pp. 163-174, Mar 2001. DOI: 10.1023/A:1008981903273
Keywords: Motion Planning, Sampling-Based Motion Planning
Links : [Published]

BibTex

@article{bayazit2001enhancing,
title={Enhancing randomized motion planners: Exploring with haptic hints},
author={Bayazit, O Burchan and Song, Guang and Amato, Nancy M},
journal={Autonomous Robots},
volume={10},
number={2},
pages={163--174},
year={2001},
publisher={Springer}
}


Abstract

In this paper, we investigate methods for enabling a human operator and an automatic motion planner to cooperatively solve a motion planning query. Our work is motivated by our experience that automatic motion planners sometimes fail due to the difficulty of discovering ‘critical’ configurations of the robot that are often naturally apparent to a human observer. Our goal is to develop techniques by which the automatic planner can utilize (easily generated) user-input, and determine ‘natural’ ways to inform the user of the progress made by the motion planner. We show that simple randomized techniques inspired by probabilistic roadmap methods are quite useful for transforming approximate, user-generated paths into collision-free paths, and describe an iterative transformation method which enables one to transform a solution for an easier version of the problem into a solution for the original problem. We also illustrate that simple visualization techniques can provide meaningful representations of the planner's progress in a 6-dimensional C-space. We illustrate the utility of our methods on difficult problems involving complex 3D CAD Models.


A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations, Mark M. Mathis, Masters Thesis, Department of Computer Science and Engineering, Texas A&M University, Dec 2000.
Keywords: High-Performance Computing, Parallel Algorithms
Links : [Published]

BibTex

@mastersthesis{CitekeyMastersthesis,
author = "Jian Tang",
title = "Spin structure of the nucleon in the asymptotic limit",
school = "Texas A&M University",
year = 2000,
address = "College Station, TX",
month = Dec
}


Abstract

There is a growing need to accurately simulate physical systems whose evolution depends on the transport of subatomic particles. It has long been recognized that the huge computational demands of the transport problem mean that practical solution times will be obtained only by the efficient utilization of parallel processing. For example, since estimates place the time devoted to particle transport in multi-physics simulations at 50-80% of total execution time, parallelizing deterministic particle transport calculations is an important problem in many applications targeted by the Accelerated Strategic Computing Initiative of the United States Department of Energy. One common approach to deterministic particle transport calculations is the discrete-ordinates method, whose most time consuming step is the transport sweep which involves multiple sweeps through the spatial grid, one for each direction of particle travel. The efficient parallel implementation of the transport sweeps is the key to parallelizing the discrete-ordinates method. The key contribution of this thesis is a new general model that can be used to compare the running times of transport sweeps on three-dimensional orthogonal grids for various mappings of the grid cells to processors. Our model, which includes machine-dependent parameters such as computation cost and communication latency, can be used to analyze and compare the effects of various spatial decompositions on the running time of the transport sweep. Insight obtained from the model yields two significant contributions to the theory of optimal transport sweeps on orthogonal grids. First, our model provides a theoretical basis that explains why, and under what circumstances, the column decomposition of the current standard KBA algorithm is superior to the 'balanced' decomposition obtained by classic domain decomposition techniques. Second, our model enables us to identify a new decomposition, which we call Hybrid, that proves to be almost as good as and often better than the current standard KBA method. We obtain expressions for the completion time and discuss theoretical results.


Ligand Binding with OBPRM and Haptic User Input: Enhancing Automatic Motion Planning with Virtual Touch, O. Burchan Bayazit , Guang Song , Nancy M. Amato , ACM Digital Library, College Station, Texas, USA, Oct 2000.
Keywords: Ligand Binding, Sampling-Based Motion Planning
Links : [Published]

BibTex

@MISC{Bayazit00ligandbinding,
author = {O. Burchan Bayazit and Guang Song and Nancy M. Amato},
title = {Ligand Binding with OBPRM and Haptic User Input: Enhancing Automatic Motion Planning with Virtual Touch},
year = {2000}
}


Abstract

In this paper, we present a framework for studying ligand binding which is based on techniques recently developed in the robotics motion planning community. We are especially interested in locating binding sites on the protein for a ligand molecule. Our work investigates the performance of a fully automated motion planner, as well improvements obtained when supplementary user input is collected using a haptic device. Our results applying an obstacle-based probabilistic roadmap motion planning algorithm (obprm) to some known protein-ligand pairs are very encouraging. In particular, we were able to automatically generate congurations close to, and correctly identify, the true binding site in the three protein-ligand complexes we tested. We nd that user input helps the planner, and a haptic device helps the user to understand the protein structure by enabling them to feel the forces which are hard to visualize.


SmartApps: An Application Centric Approach to High Performance Computing, Lawrence Rauchwerger, Nancy Amato, Josep Torrellas, Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science, Vol: 2017, pp. 82-96, Yorktown Heights, New York, USA, Aug 2000. DOI: 10.1007/3-540-45574-4_6
Keywords: Computer Architecture, High-Performance Computing, Optimizing Compilers
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-45574-4_6,
author="Rauchwerger, Lawrence
and Amato, Nancy M.
and Torrellas, Josep",
editor="Midkiff, Samuel P.
and Moreira, Jos{\'e} E.
and Gupta, Manish
and Chatterjee, Siddhartha
and Ferrante, Jeanne
and Prins, Jan
and Pugh, William
and Tseng, Chau-Wen",
title="SmartApps: An Application Centric Approach to High Performance Computing",
booktitle="Languages and Compilers for Parallel Computing",
year="2001",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="82--96",
abstract="State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SmartApp performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of Smartapps and then present the achievements to date: Run-time optimizations, performance modeling, and moderately reconfigurable hardware.",
isbn="978-3-540-45574-5"
}


Abstract

State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application’s performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code’s input and the system’s resources and state, the SmartApp performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of Smartapps and then present the achievements to date: Run-time optimizations, performance modeling, and moderately reconfigurable hardware.


Merging Physical Manipulatives and Digital Interface in Educational Software, Anna Zacchi, Nancy M. Amato, Proceedings of ED-MEDIA 2000 - World Conference on Educational Multimedia, Hypermedia & Telecommunications, pp. 1780-1781, Montreal, Canada, Jun 2000.
Keywords: Elementary Education
Links : [Published]

BibTex

@inproceedings{zacchi-edmedia-2000
, author = \"Anna Zacchi and Nancy M. Amato\"
, title = \"Merging Physical Manipulatives and Digital Interface in Educational Software\"
, booktitle = \"Proceedings of ED-MEDIA 2000 - World Conference on Educational Multimedia, Hypermedia \\& Telecommunications\"
, pages = \"1780-1781\"
, year = \"2000\"
, ISBN = \"978-1-880094-40-2\"
}


Abstract

In this paper we describe how elementary school students used physical manipulatives in conjunction with the digital interface of educational software for geometry. The blending of physical manipulatives and digital interface may help them to overcome the limits of the representation and interaction modalities of the digital interface.


Speculative Parallelization of Partially Parallel Loops, Francis H. Dang, Lawrence Rauchwerger, Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR.) Lecture Notes in Computer Science, Vol: 1915, pp. 285-299, Rochester, New York, USA, May 2000. DOI: 10.1007/3-540-40889-4_22
Keywords: Optimizing Compilers
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-40889-4_22,
author="Dang, Francis H.
and Rauchwerger, Lawrence",
editor="Dwarkadas, Sandhya",
title="Speculative Parallelization of Partially Parallel Loops",
booktitle="Languages, Compilers, and Run-Time Systems for Scalable Computers",
year="2000",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="285--299",
abstract="Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently de- fined access patterns. We have previously proposed a framework for their identifi- cation. We speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; if the test failed, then the loop was re-executed serially. While this method ex- ploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. In this paper we propose a generalization of our speculative doall parallelization technique, named Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the run-time dependence test itself, i.e., removes the time lost due to incorrect parallel execution. The asymptotic time-complexity is, for fully serial loops, equal to the sequential execution time. We present the base algorithm and an analysis of the different heuristics for its practical application. Some preliminary experimental results on loops from Track will show the performance of this new technique.",
isbn="978-3-540-40889-5"
}


Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently de- fined access patterns. We have previously proposed a framework for their identifi- cation. We speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; if the test failed, then the loop was re-executed serially. While this method ex- ploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. In this paper we propose a generalization of our speculative doall parallelization technique, named Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the run-time dependence test itself, i.e., removes the time lost due to incorrect parallel execution. The asymptotic time-complexity is, for fully serial loops, equal to the sequential execution time. We present the base algorithm and an analysis of the different heuristics for its practical application. Some preliminary experimental results on loops from Track will show the performance of this new technique.


Techniques for Reducing the Overhead of Run-time Parallelization, Hao Yu, Lawrence Rauchwerger, Compiler Construction (CC.) Lecture Notes in Computer Science, Vol: 1781, pp. 232-248, Berlin, Germany, Mar 2000. DOI: 10.1007/3-540-46423-9_16
Keywords: Parallelising Compilers
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-46423-9_16,
author="Yu, Hao
and Rauchwerger, Lawrence",
editor="Watt, David A.",
title="Techniques for Reducing the Overhead of Run-Time Parallelization",
booktitle="Compiler Construction",
year="2000",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="232--248",
abstract="Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we have introduced a novel framework for their identification: speculative parallelization. While we have previously shown that this method is inherently scalable its practical success depends on the fraction of ideal speedup that can be obtained on modest to moderately large parallel machines. Maximum parallelism can be obtained only through a minimization of the run-time overhead of the method, which in turn depends on its level of integration within a classic restructuring compiler and on its adaptation to characteristics of the parallelized application. We present several compiler and run-time techniques designed specifically for optimizing the run-time parallelization of sparse applications. We show how we minimize the run-time overhead associated with the speculative parallelization of sparse applications by using static control flow information to reduce the number of memory references that have to be collected at run-time. We then present heuristics to speculate on the type and data structures used by the program and thus reduce the memory requirements needed for tracing the sparse access patterns. We present an implementation in the Polaris infrastructure and experimental results.",
isbn="978-3-540-46423-5"
}


Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we have introduced a novel framework for their identification: speculative parallelization. While we have previously shown that this method is inherently scalable its practical success depends on the fraction of ideal speedup that can be obtained on modest to moderately large parallel machines. Maximum parallelism can be obtained only through a minimization of the run-time overhead of the method, which in turn depends on its level of integration within a classic restructuring compiler and on its adaptation to characteristics of the parallelized application. We present several compiler and run-time techniques designed specifically for optimizing the run-time parallelization of sparse applications. We show how we minimize the run-time overhead associated with the speculative parallelization of sparse applications by using static control flow information to reduce the number of memory references that have to be collected at run-time. We then present heuristics to speculate on the type and data structures used by the program and thus reduce the memory requirements needed for tracing the sparse access patterns. We present an implementation in the Polaris infrastructure and experimental results.


Task Scheduling and Parallel Mesh-Sweeps in Transport Computations, Nancy M. Amato, Ping An, Technical Report, TR00-009, Department of Computer Science and Engineering, Texas A&M University, Jan 2000.
Keywords: Algorithmic Skeletons, Parallel Processing, Task Planning
Links : [Manuscript]

BibTex

@techreport{namato-tr00-009-2000,
title = "Task Scheduling and Parallel Mesh-Sweeps in Transport Computations",
author = "Nancy M. Amato, Ping An",
institution = "Texas A&M University",
address = "College Station, TX",
number = "TR00-009",
year = 2000,
month = jan
}


Abstract

In this report, we describe the relationship between task scheduling and deterministic mesh sweeps that arise in particle-transport computations. In particular, we argue that the efficient parallelization of such computations is most accurately viewed as a generalization of the traditional task scheduling problem and not as an application for domain decomposition, as has been generally assumed in the past. In fact, as we show, the transport problem represents an interesting composite task-scheduling problem: given one set of tasks, and multiple dependence graphs for these tasks (one for each distinguishable sweep direction), find an assignment of tasks to processors that minimizes the time required to process all such graphs. Within this context, our goal is to study and propose scheduling algorithms that are suitable for the particular generalization of the scheduling problem that arises in the context of transport sweeps. This report documents our progress to date and describes future plans. It consists of three parts. First, we define the traditional task scheduling problem and describe relevant related work. Second, we describe our progress in building a C++ task scheduling library that will provide a testbed in which to evaluate and compare the scheduling algorithms we design, including an experimental evaluation of the (traditional) scheduling algorithms implemented to date. Third, we briefly outline our plans for future work.


Run-time Parallelization Optimization Techniques, Hao Yu, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science, Vol: 1863, La Jolla, California, USA, Aug 1999. DOI: 10.1007/3-540-44905-1_36
Keywords: Parallelising Compilers
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-44905-1_36,
author="Yu, Hao
and Rauchwerger, Lawrence",
editor="Carter, Larry
and Ferrante, Jeanne",
title="Run-Time Parallelization Optimization Techniques",
booktitle="Languages and Compilers for Parallel Computing",
year="2000",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="481--484",
abstract="In this paper we first present several compiler techniques to reduce the overhead of run-time parallelization. We show how to use static control flow information to reduce the number of memory references that need to be traced at run-time. Then we introduce several methods designed specifically for the parallelization of sparse applications. We detail some heuristics on how to speculate on the type and data structures used by the original code and thus reduce the memory requirements for tracing the sparse access patterns without performing any additional work. Optimization techniques for the sparse reduction parallelization and speculative loop distribution conclude the paper.",
isbn="978-3-540-44905-8"
}


Abstract

In this paper we first present several compiler techniques to reduce the overhead of run-time parallelization. We show how to use static control flow information to reduce the number of memory references that need to be traced at run-time. Then we introduce several methods designed specifically for the parallelization of sparse applications. We detail some heuristics on how to speculate on the type and data structures used by the original code and thus reduce the memory requirements for tracing the sparse access patterns without performing any additional work. Optimization techniques for the sparse reduction parallelization and speculative loop distribution conclude the paper.


The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization, Lawrence Rauchwerger, David Padua, IEEE Transactions on Parallel and Distributed Systems, Vol: 10, Issue: 2, pp. 160-180, Feb 1999. DOI: 10.1109/71.752782
Keywords: Data Dependence, Parallel Processing, Parallelising Compilers
Links : [Published]

BibTex

@ARTICLE{752782, author={L. {Rauchwerger} and D. A. {Padua}}, journal={IEEE Transactions on Parallel and Distributed Systems}, title={The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization}, year={1999}, volume={10}, number={2}, pages={160-180}, doi={10.1109/71.752782}}


Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is reexecuted serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks, which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods.


Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors, Ye Zhang, Lawrence Rauchwerger, Josep Torrellas, Proceedings Fifth International Symposium on High-Performance Computer Architecture, pp. 135-139, Orlando, FL, USA, USA, Jan 1999. DOI: 10.1109/HPCA.1999.744351
Keywords: Computer Architecture, High-Performance Computing, Optimizing Compilers
Links : [Published]

BibTex

@INPROCEEDINGS{744351, author={ {Ye Zhang} and L. {Rauchwerger} and J. {Torrellas}}, booktitle={Proceedings Fifth International Symposium on High-Performance Computer Architecture}, title={Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors}, year={1999}, volume={}, number={}, pages={135-139}, doi={10.1109/HPCA.1999.744351}}


Abstract

Recently, we introduced a novel framework for speculative parallelization in hardware (Y. Zhang et al., 1998). The scheme is based on a software based run time parallelization scheme that we proposed earlier (L. Rauchwerger and D. Padue, 1995). The idea is to execute the code (loops) speculatively in parallel. As parallel execution proceeds, extra hardware added to the directory based cache coherence of the DSM machine detects if there is a dependence violation. If such a violation occurs, execution is interrupted, the state is rolled back in software to the most recent safe state, and the code is re-executed serially from that point. The safe state is typically established at the beginning of the loop. Such a scheme is somewhat related to speculative parallelization inside a multiprocessor chip, which also relies on extending the cache coherence protocol to detect dependence violations. Our scheme, however, is targeted to large scale DSM parallelism. In addition, it does not have some of the limitations of the proposed chip-multiprocessor schemes. Such limitations include the need to bound the size of the speculative state to fit in a buffer or L1 cache, and a strict in-order task commit policy that may result in load imbalance among processors. Unfortunately, our scheme has higher recovery costs if a dependence violation is detected, because execution has to backtrack to a safe state that is usually the beginning of the loop. Therefore, the aim of the paper is to extend our previous hardware scheme to effectively handle codes (loops) with a modest number of cross-iteration dependences.


Developing a Cost Model for Communication on a Symmetric Multiprocessor, Jack Perdue, Master Thesis, College Station, Texas, USA, Dec 1998.
Keywords: High-Performance Computing, Parallel Processing, Scientific Computing
Links : [Published] [Manuscript]

BibTex

@mastersthesis{perdue-ms-1998
, author = "Jack Perdue"
, title = "Developing a cost model for communication on a symmetric multiprocessor"
, school = "Department of Computer Science and Engineering, Texas A\&M University"
, year = "1998"
, month = "December"
, note = "http://hdl.handle.net/1969.1/ETD-TAMU-1998-THESIS-P47"
}


Abstract

Despite numerous advances in hardware design, the full ics. potential of today's parallel processing systems is rarely realized due to the difficulty in predicting how a particular algorithm will utilize the underlying architecture. This study looks at a number of proposed "bridging models'' for the bulk-synchronous parallel processing paradigm that attempt to provide an accurate abstraction of how well an algorithm will exploit the underlying architecture. The study provides evidence that temporal and spatial locality of memory accesses must be considered if a bridging model is to be useful. Additionally, this study presents a pair of prediction functions that serve as indicators for best and worst case performance on a particular class of systems, symmetric multiprocessors -in particular, the SGI Power Challenge.


Principles of Speculative Run-time Parallelization, Devang Patel, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science, Vol: 1656, Chapel Hill, NC, USA, Aug 1998. DOI: 10.1007/3-540-48319-5_21
Keywords: Parallelising Compilers
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-48319-5_21,
author="Patel, Devang
and Rauchwerger, Lawrence",
editor="Chatterjee, Siddhartha
and Prins, Jan F.
and Carter, Larry
and Ferrante, Jeanne
and Li, Zhiyuan
and Sehr, David
and Yew, Pen-Chung",
title="Principles of Speculative Run---Time Parallelization",
booktitle="Languages and Compilers for Parallel Computing",
year="1999",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="323--337",
abstract="Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We advocate a novel framework for the identification of parallel loops. It speculatively executes a loop as a doall and applies a fully parallel data dependence test to check for any unsatisfied data dependencies; if the test fails, then the loop is re-executed serially. We will present the principles of the design and implementation of a compiler that employs both run-time and static techniques to parallelize dynamic applications. Run-time optimizations always represent a tradeoff between a speculated potential benefit and a certain (sure) overhead that must be paid. We will introduce techniques that take advantage of classic compiler methods to reduce the cost of run-time optimization thus tilting the outcome of speculation in favor of significant performance gains. Experimental results from the PERFECT, SPEC and NCSA Benchmark suites show that these techniques yield speedups not obtainable by any other known method.",
isbn="978-3-540-48319-9"
}


Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. We advocate a novel framework for the identification of parallel loops. It speculatively executes a loop as a doall and applies a fully parallel data dependence test to check for any unsatisfied data dependencies; if the test fails, then the loop is re-executed serially. We will present the principles of the design and implementation of a compiler that employs both run-time and static techniques to parallelize dynamic applications. Run-time optimizations always represent a tradeoff between a speculated potential benefit and a certain (sure) overhead that must be paid. We will introduce techniques that take advantage of classic compiler methods to reduce the cost of run-time optimization thus tilting the outcome of speculation in favor of significant performance gains. Experimental results from the PERFECT, SPEC and NCSA Benchmark suites show that these techniques yield speedups not obtainable by any other known method.


Standard Templates Adaptive Parallel Library (STAPL), Lawrence Rauchwerger, Francisco Arzu, K Ouchi, Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 402-409, Pittsburgh, PA, USA, May 1998. DOI: 10.1007/3-540-49530-4_32
Keywords: High-Performance Computing, Parallel Programming, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/3-540-49530-4_32,
author=\"Rauchwerger, Lawrence
and Arzu, Francisco
and Ouchi, Koji\",
editor=\"O\'Hallaron, David R.\",
title=\"Standard Templates Adaptive Parallel Library (STAPL)\",
booktitle=\"Languages, Compilers, and Run-Time Systems for Scalable Computers\",
year=\"1998\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"402--409\",
abstract=\"STAPL (Standard Adaptive Parallel Library) is a parallel C++ library designed as a superset of the STL, sequentially consistent for functions with the same name, and executes on uni- or multiprocessors. STAPL is implemented using simple parallel extensions of C++ which provide a SPMD model of parallelism supporting recursive parallelism. The library is intended to be of generic use but emphasizes irregular, non-numeric programs to allow the exploitation of parallelism in areas such as geometric modeling or graph algorithms which use dynamic linked data structures. Each library routine has several dicfferent algorithmic options, and the choice among them will be made adaptively based on a performance model, statistical feedback, and current run-time conditions. Builtc-in performance monitors can measure actual performance and, using an extension of the BSP model predict the relative performance of the algorithmic choices for each library routine. STAPL is intended to possibly replace STL in a user transparent manner and run on small to medium scale shared memory multiprocessors which support OpenMP.\",
isbn=\"978-3-540-49530-7\"
}


Abstract

STAPL (Standard Adaptive Parallel Library) is a parallel C++ library designed as a superset of the STL, sequentially consistent for functions with the same name, and executes on uni- or multiprocessors. STAPL is implemented using simple parallel extensions of C++ which provide a SPMD model of parallelism supporting recursive parallelism. The library is intended to be of generic use but emphasizes irregular, non-numeric programs to allow the exploitation of parallelism in areas such as geometric modeling or graph algorithms which use dynamic linked data structures. Each library routine has several different algorithmic options, and the choice among them will be made adaptively based on a performance model, statistical feedback, and current run-time conditions. Built-in performance monitors can measure actual performance and, using an extension of the BSP model predict the relative performance of the algorithmic choices for each library routine. STAPL is intended to possibly replace STL in a user transparent manner and run on small to medium scale shared memory multiprocessors which support OpenMP.


Parallel programming with Polaris, W. Blume; R. Doallo; R. Eigenmann; J. Grout; J. Hoeflinger; T. Lawrence, IEEE Computer, Vol: 29, Issue: 12, pp. 78-82, Dec 1996. DOI: 10.1109/2.546612
Keywords: Parallel Programming, Parallelising Compilers
Links : [Published]

BibTex

@ARTICLE{546612,
author={W. {Blume} and R. {Doallo} and R. {Eigenmann} and J. {Grout} and J. {Hoeflinger} and T. {Lawrence}},
journal={Computer},
title={Parallel programming with Polaris},
year={1996},
volume={29},
number={12},
pages={78-82},
doi={10.1109/2.546612}}


Restructuring programs for high-speed computers with Polaris, Blume and Eigenmann and Faigin and Grout and Jaejin Lee and Lawrence and Hoeflinger and Padua and Yunheung Paek and Petersen and Pottenger and Rauchwerger and Peng Tu and Weatherford, In Proc. ICPP Workshop on Challenges for Parallel Processing, Ithaca, NY, USA, Aug 1996. DOI: 10.1109/ICPPW.1996.538601
Keywords: Optimizing Compilers, Parallel Programming
Links : [Published]

BibTex

@INPROCEEDINGS {538601,
author = {Blume and Eigenmann and Faigin and Grout and Jaejin Lee and Lawrence and Hoeflinger and Padua and Yunheung Paek and Petersen and Pottenger and Rauchwerger and Peng Tu and Weatherford},
booktitle = {Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing},
title = {Restructuring programs for high-speed computers with Polaris},
year = {1996},
volume = {},
issn = {1530-2016},
pages = {149-161},
keywords = {program compilers;parallel programming},
doi = {10.1109/ICPPW.1996.538601},
url = {https://doi.ieeecomputersociety.org/10.1109/ICPPW.1996.538601},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {aug}
}


A Scalable Method for Run-Time Loop Parallelization, Lawrence Rauchwerger, Nancy M. Amato, David A. Padua, International Journal of Parallel Programming, Vol: 23, Issue: 6, pp. 537-576, Dec 1995. DOI: 10.1007/BF02577866
Keywords: Parallelising Compilers
Links : [Published]

BibTex

@article{cite-key,
Abstract = {Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, well behaved, statically analyzable access patterns. However, they cannot extract a significant fraction of the avaialable, parallelism if the program has a complex and/or statically insufficiently defined access pattern, e.g., simulation programs with irregular domains and/or dynamically changing interactions. Since such programs represent a large fraction of all applications, techniques are needed for extracting their inherent parallelism at run-time. In this paper we give a new run-time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generatesinspector code that performas run-time preprocessing of the loop's access pattern, andscheduler code that schedules (and executes) the loop interations. The inspector is fully parallel, uses no sychronization, and can be applied to any loop (from which an inspector can be extracted). In addition, it can implement at run-time the two most effective transformations for increasing the amount of parallelism in a loop:array privatization andreduction parallelization (elementwise). The ability to identify privatizable and reduction variables is very powerful since it eliminates the data dependences involving these variables and},
Author = {Rauchwerger, Lawrence and Amato, Nancy M. and Padua, David A.},
Da = {1995/12/01},
Date-Added = {2020-11-02 22:20:42 -0600},
Date-Modified = {2020-11-02 22:20:42 -0600},
Doi = {10.1007/BF02577866},
Id = {Rauchwerger1995},
Isbn = {1573-7640},
Journal = {International Journal of Parallel Programming},
Number = {6},
Pages = {537--576},
Title = {A scalable method for run-time loop parallelization},
Ty = {JOUR},
Url = {https://doi.org/10.1007/BF02577866},
Volume = {23},
Year = {1995},
Bdsk-Url-1 = {https://doi.org/10.1007/BF02577866}}


Abstract

Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, well behaved, statically analyzable access patterns. However, they cannot extract a significant fraction of the available, parallelism if the program has a complex and/or statically insufficiently defined access pattern, e.g., simulation programs with irregular domains and/or dynamically changing interactions. Since such programs represent a large fraction of all applications, techniques are needed for extracting their inherent parallelism at run-time. In this paper we give a new run-time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generates inspector code that perform as run-time preprocessing of the loop's access pattern, and scheduler code that schedules (and executes) the loop interations. The inspector is fully parallel, uses no sychronization, and can be applied to any loop (from which an inspector can be extracted). In addition, it can implement at run-time the two most effective transformations for increasing the amount of parallelism in a loop:array privatization and reduction parallelization (elementwise). The ability to identify privatizable and reduction variables is very powerful since it eliminates the data dependences involving these variables and


Run-Time Methods for Parallelizing Partially Parallel Loops, Lawrence Rauchwerger, Nancy M. Amato, David A. Padua, Proceedings of the 9th International Conference on Supercomputing, pp. 137-146, Barcelona, Spain, Jul 1995. DOI: 10.1145/224538.224553
Keywords: Data Dependence, Parallel Programming, Parallelising Compilers
Links : [Published]

BibTex

@inproceedings{10.1145/224538.224553,
author = {Rauchwerger, Lawrence and Amato, Nancy M. and Padua, David A.},
title = {Run-Time Methods for Parallelizing Partially Parallel Loops},
year = {1995},
isbn = {0897917286},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/224538.224553},
doi = {10.1145/224538.224553},
booktitle = {Proceedings of the 9th International Conference on Supercomputing},
pages = {137–146},
numpages = {10},
location = {Barcelona, Spain},
series = {ICS '95}
}


Abstract

In this paper we give a new run-time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generates inspector code that performs run-time preprocessing of the loop’s access pattern, and scheduler code that schedules (and executes) the loop iterations. The inspector is fully parallel, uses no synchronization, and can be applied to any loop. In addition, it can implement at run-time the two most effective transformations for increasing the amount of parallelism in a loop: array privatization and reduction parallelizatiort (element–wise). We also describe a new scheme for constructing an optimal parallel execution schedule for the iterations of the loop.


Parallelizing While Loops for Multiprocessor Systems, Lawrence Rauchwerger, David Padua, Proceedings of 9th International Parallel Processing Symposium (IPPS), pp. 347-356, Santa Barbara, CA, USA, Apr 1995. DOI: 10.1109/IPPS.1995.395955
Keywords: Data Dependence, Parallelising Compilers
Links : [Published]

BibTex

@INPROCEEDINGS{395955, author={L. {Rauchwerger} and D. {Padua}}, booktitle={Proceedings of 9th International Parallel Processing Symposium}, title={Parallelizing while loops for multiprocessor systems}, year={1995}, volume={}, number={}, pages={347-356}, doi={10.1109/IPPS.1995.395955}}


Abstract

Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Because these types of loops arise frequently in practice, we have developed techniques that can automatically transform them for parallel execution. We succeed in parallelizing loops involving linked lists traversals-something that has not been done before. This is an important problem since linked list traversals arise frequently in loops with irregular access patterns, such as sparse matrix computations. The methods can even be applied to loops whose data dependence relations cannot be analyzed at compile-time. Experimental results on loops from the PERFECT Benchmarks and sparse matrix packages show that these techniques can yield significant speedups.


The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization, Lawrence Rauchwerger, David Padua, Proceedings of the 8th International Conference on Supercomputing (ICS), pp. 33-43, Manchester, England, Jul 1994. DOI: 10.1145/181181.181254
Keywords: Data Dependence, Optimizing Compilers, Parallelising Compilers
Links : [Published]

BibTex

@inproceedings{10.1145/181181.181254,
author = {Rauchwerger, Lawrence and Padua, David},
title = {The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization},
year = {1994},
isbn = {0897916654},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/181181.181254},
doi = {10.1145/181181.181254},
abstract = {Current parallelizing compilers cannot identify a significant fraction of fully parallel loops because they have complex or statically insufficiently defined access patterns. For this reason, we have developed the Privatizing DOALL test—a technique for identifying fully parallel loops at run-time, and dynamically privatizing scalars and arrays. The test itself is fully parallel, and can be applied to any loop, regardless of the structure of its data and/or control flow. The technique can be utilized in two modes: (i) the test is performed before executing the loop and indicates whether the loop can be executed as a DOALL; (ii) speculatively—the loop and the test are executed simultaneously, and it is determined later if the loop was in fact parallel. The test can also be used for debugging parallel programs. We discuss how the test can be inserted automatically by the compiler and outline a cost/performance analysis that can be performed to decide when to use the test. Our conclusion is that the test should almost always be applied—because, as we show, the expected speedup for fully parallel loops is significant, and the cost of a failed test (a not fully parallel loop), is minimal. We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups.},
booktitle = {Proceedings of the 8th International Conference on Supercomputing},
pages = {33–43},
numpages = {11},
location = {Manchester, England},
series = {ICS '94}
}


Abstract

Current parallelizing compilers cannot identify a significant fraction of fully parallel loops because they have complex or statically insufficiently defined access patterns. For this reason, we have developed the Privatizing DOALL test—a technique for identifying fully parallel loops at run-time, and dynamically privatizing scalars and arrays. The test itself is fully parallel, and can be applied to any loop, regardless of the structure of its data and/or control flow. The technique can be utilized in two modes: (i) the test is performed before executing the loop and indicates whether the loop can be executed as a DOALL; (ii) speculatively—the loop and the test are executed simultaneously, and it is determined later if the loop was in fact parallel. The test can also be used for debugging parallel programs. We discuss how the test can be inserted automatically by the compiler and outline a cost/performance analysis that can be performed to decide when to use the test. Our conclusion is that the test should almost always be applied—because, as we show, the expected speedup for fully parallel loops is significant, and the cost of a failed test (a not fully parallel loop), is minimal. We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups.


Checking Linked Data Structures, Nancy M. Amato and Michael C. Loui, Proceedings of IEEE 24th International Symposium on Fault-Tolerant Computing, pp. 164-173, Austin, TX, USA, Jun 1994. DOI: 10.1109/FTCS.1994.315644
Keywords: Algorithms, Data Structures
Links : [Published]

BibTex

@inproceedings{amato-ftcs94-c
, author = "Nancy M. Amato and Michael C. Loui"
, title = "Checking Linked Data Structures"
, booktitle = "Proceedings of the 24th Annual International Symposium on Fault-Tolerant Computing (FTCS)"
, year = 1994
, pages = "164--173"
, doi="10.1109/FTCS.1994.315644"
}


Abstract

In the program checking paradigm, the original program is run on the desired input, and its output is checked by another program called a checker. Recently, the notion of program checking has been extended from its original formulation of checking functions to checking a sequence of operations which query and alter the state of an object external to the program, e.g., checking the interactions between a client and the manager (server) of a data structure. In this expanded paradigm, the checker acts as an intermediary between the client, which generates the requests, and the server, which processes them. The checker is allowed a small amount of reliable memory and may provide a probabilistic guarantee of correctness for the client. We present off-line and on-line checkers for data structures such as linked lists, trees, and graphs. Previously, the only data structures for which such checkers existed were random access memories, stacks, and queues.


An NC1 Parallel 3D Convex Hull Algorithm, Nancy M. Amato and Franco P. Preparata, Proceedings of the Ninth Annual Symposium on Computational Geometry, pp. 289-297, Jul 1993. DOI: 10.1145/160985.161152
Keywords: Algorithms, Computational Geometry, Parallel Algorithms
Links : [Published]

BibTex

@inproceedings{amato-socg-93
, author = "N. M. Amato and F. P. Preparata"
, title = "An {NC}$^1$ Parallel {3D} Convex Hull Algorithm"
, booktitle = "Proceedings of the 9th Annual {ACM} Symposium on Computational Geometry"
, year = 1993
, pages = "289--297"
}


Abstract

In this paper we present an O(log n) time parallel algorithm for computing the convex hull of n points in R3. This algorithm uses O(n^(1+α)) processors on a CREW PRAM, for any constant 0 < α ≤ 1. So far, all adequately documented parallel algorithms proposed for this problem use time at least O (log^2 n). In addition, the algorithm presented here is the first parallel algorithm for the three-dimensional convex hull problem that is not based on the serial divide-and-conquer algorithm of Preparata and Hong, whose crucial operation is the merging of the convex hulls of two linearly separated point sets. The contributions of this paper are therefore (i) an O(logn) time parallel algorithm for the three-dimensional convex hull problem, and (ii) a parallel algorithm for this problem that does not follow the traditional divide-and-conquer paradigm.


Next Generation Geographic Modeling Framework Research at USACERL, Kurt Buehler, Jeffrey Wallace, Michael Shapiro, Nancy M. Amato, Unni Narayanan, GRASSClippings: The Journal of Open Geographic Information Systems, Vol: 6, Issue: 3, pp. 35-39, Dec 1992.
Keywords: Algorithms
Links : [Published]

BibTex

@article{
, author = "Kurt Buehler and Jeffrey Wallace and Michael Shapiro and Nancy M. Amato and Unni Narayanan"
, title = "Next Generation Geographic Modeling Framework Research at USACERL"
, journal = "GRASSClippings: The Journal of Open Geographic Information Systems"
, volume = "6"
, number = "3"
, year = "1992"
, pages = "35-39"
}


Abstract

N/A


The Parallel 3D Convex Hull Problem Revisited, Nancy M. Amato and Franco P. Preparata, International Journal of Computational Geometry & Applications, Vol: 2, Issue: 2, pp. 163-173, Jun 1992. DOI: 10.1142/S021819599200010X
Keywords: Computational Geometry, Parallel Algorithms
Links : [Published]

BibTex

@article{amato-ijcga-92
, author = "Nancy M. Amato and Franco P. Preparata"
, title = "The Parallel 3D Convex-Hull Problem Revisited"
, journal = "International Journal of Computational Geometry \& Applications"
, volume = 2
, number = 2
, month = "June"
, year = "1992"
, pages = "163--174"
, doi = "10.1142/S021819599200010X"
}


Abstract

In this paper we prove the correctness of a "local" criterion for computing the convex hull of the union ( "merging") of two disjoint convex polyhedra. This criterion is structural. Therefore it can be algorithmically tested in several ways, not necessarily involving the determination of support (tangent) planes; indeed, it can be implemented by just testing for the intersection of certain planes and lines with convex polytopes. This criterion is amenable to parallel implementation and leads to a provably correct algorithm that computes the convex hull of any n points in three-dimensional space in O(log2 n) time using O(n) processors on a CREW PRAM.


Reversing Trains: A Turn of the Century Sorting Problem, Nancy Amato, Manuel Blum, Sandra Irani, Ronitt Rubinfeld, Journal of Algorithms, Vol: 10, Issue: 3, pp. 413-428, Sep 1989. DOI: 10.1016/0196-6774(89)90037-0
Keywords: Algorithms
Links : [Published]

BibTex

@article{amato-joa89
, author = "Nancy Amato and Manuel Blum and Sandra Irani and Ronitt Rubinfeld"
, title = "Reversing Trains: A Turn of the Century Sorting Problem"
, journal = "Journal of Algorithms"
, volume = "10"
, number = "3"
, month = "September"
, year = "1989"
, pages = "413--428"
, note ="DOI: 10.1016/0196-6774(89)90037-0"
}


Abstract

In this paper we study a reversing train puzzle proposed by Sam Loyd near the turn of the century. We concern ourselves with a version of this puzzle described most recently by A. K. Dewdney in Scientific American. There is a train, locomotive and n cars, that must be entirely reversed using only a short spur line attached to the main track. The efficiency of a solution is determined by summing, for all cars, the total distance moved by each car, where distance is measured in car lengths. We present an O(n2 log2 n) algorithm for accomplishing this task.