STAPL Components
The Standard Template Adaptive Parallel Library (STAPL) comprises several sections: Run-time System, Paragraph, Skeletons Framework, pContainers, pViews, and pAlgorithms.
STAPL’s run-time system (RTS) provides to the Developer and Specialist the following facilities:
- Communication primitives, based on Adaptive Remote Method Invocation (ARMI);
- Executor of pRange’s tasks that enforces tasks dependencies;
- User definable scheduler for the tasks of pRanges;
- Performance Monitor for adaptiveness and for user feedback.
ARMI communication library hides all the details about the underlying platform, by being implemented over the lower level communication facilities, such as MPI, OpenMP, Pthreads, etc. The communication interface supports a well defined consistency model to allow the developers to design algorithms in a uniform and portable way.
PARAGRAPH is the STAPL data flow engine which allows parallelism to be expressed explicitly using data flow graphs (a.k.a. task graphs).
The STAPL Skeleton Framework is an interface for algorithm development which helps developers to only focus on defining their computation in terms of skeletons. Each skeleton is translated to a parametric data flow graph and is expanded upon the presence of input data. The data flow representation of skeletons allows programs to run on distributed and shared memory systems.
A pContainer is a distributed data structures that have interfaces similar to the (sequential) C++ standard library (stl). Its data is partitioned and distributed across the machine, but the User is offered a shared object view. The pContainer distribution across the machine can be user specified or automatically selected by STAPL. STAPL provides a large number of basic data parallel structures (e.g., pArray, pList, pVector, pGraph, pMap, pSet).
pViews are the STAPL equivalent of STL iterators in the sense that they provide a generic mechanism to access the data of the pContainers. pViews emphasize processing data ranges over accessing single items. Each pView may be partitioned into subviews hierarchically and this allow to adjust the degree of parallelism to the application needs and nested parallelsm.
A pAlgorithm is the parallel equivalent of an STL algorithm. A pAlgorithm is written in terms of pViews operations. The hierarchical structure of input pViews and the algorithm access pattern decide the degree of parallelism available for computation. A pAlgorithm can modify input pViews for optimized data access and/or easier algorithm specification.
The STAPL Graph Library (SGL) is a distributed-memory high-performance parallel graph processing framework written in C++ using STAPL. In addition to a graph data structure, SGL includes a collection of efficient parallel graph algorithms.
Publications
- Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2017). Bounded Asynchrony and Nested Parallelism for Scalable Graph Processing. In Proc. Supercomputing (SC), Doctoral Showcase Poster. View publication
- Fidel, A. , Coral, F. , Riedel, C. , Amato, N.M. , & Rauchwerger, L. (2017). Fast Approximate Distance Queries in Unweighted Graphs using Bounded Asynchrony. Workshop on Languages and Compilers for Parallel Computing (LCPC 2016). Lecture Notes in Computer Science, vol 10136. Springer, Cham.. https://doi.org/10.1007/978-3-319-52709-3_4
- Papadopoulos, I. , Thomas, N. , Fidel, A. , Hoxha, D. , Amato, N.M. , & Rauchwerger, L. (2016). Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory. In Wkshp. on Lang. and Comp. for Par. Comp. (LCPC) , 106--121. https://doi.org/10.1007/978-3-319-29778-1_7
- Harshvardhan, , Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2015). An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms (Conference Best Paper Finalist). In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT) , 201-212. https://doi.org/10.1109/PACT.2015.34
- Zandifar, M. , Abdujabbar, M. , Majidi, A. , Keyes, D. , Amato, N.M. , & Rauchwerger, L. (2015). Composing Algorithmic Skeletons to Express High-Performance Scientific Applications. In Proc. ACM Int. Conf. Supercomputing (ICS) , 415–424. https://doi.org/10.1145/2751205.2751241
- Papadopoulos, I. , Thomas, N. , Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2015). STAPL-RTS: An Application Driven Runtime System. In Proc. ACM Int. Conf. Supercomputing (ICS) , 425–434. https://doi.org/10.1145/2751205.2751233
- Harshvardhan, , West, B. , Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2015). A Hybrid Approach To Processing Big Data Graphs on Memory-Restricted Systems. In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS) , 799-808. https://doi.org/10.1109/IPDPS.2015.28
- Harshvardhan, , Amato, N.M. , & Rauchwerger, L. (2015). A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP) , 50(8) , 285–286. https://doi.org/10.1145/2688500.2700994
- Tomkins, D. , Smith, T. , Amato, N.M. , & Rauchwerger, L. (2015). Efficient, Reachability-based, Parallel Algorithms for Finding Strongly Connected Components. Technical Report, TR15-002. View publication
- Bailey, E.S. , Hawkins, W.D. , Adams, M.L. , Brown, P.N. , Kunen, A.J. , Adams, M.P. , Smith, T. , Amato, N.M. , & Rauchwerger, A.L. (2014). Validation of Full-Domain Massively Parallel Transport Sweep Algorithms. Transactions of the American Nuclear Society , 111 , 699-702. View publication
- Zandifar, M. , Thomas, N. , Amato, N.M. , & Rauchwerger, L. (2014). The STAPL Skeleton Framework. In Wkshp. on Lang. and Comp. for Par. Comp. (LCPC) , 176--190. https://doi.org/10.1007/978-3-319-17473-0_12
- Harshvardhan, , Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2014). KLA: A New Algorithmic Paradigm for Parallel Graph Computations (Conference Best Paper). In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT) , 27–38. https://doi.org/10.1145/2628071.2628091
- Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2014). From Petascale to the Pocket: Adaptively Scaling Parallel Programs for Mobile SoCs. In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), SRC Poster , 511–512. https://doi.org/10.1145/2628071.2671426
- Fidel, A. , Jacobs, S.A. , Sharma, S. , Rauchwerger, L. , & Amato, N.M. (2013). Load Balancing Techniques for Scalable Parallelization of Sampling-Based Motion Planning Algorithms. Technical Report, TR13-002 , Parasol Laboratory, Department of Computer Science, Texas A&M University.
- Hawkins, W.D. , Smith, T. , Adams, M.P. , Rauchwerger, L. , Amato, N. , & Adams, M.L. (2012). Efficient Massively Parallel Transport Sweeps. Transactions of the American Nuclear Society , 107(1) , 477-481. View publication
- Harshvardhan, , Fidel, A. , Amato, N.M. , & Rauchwerger, L. (2012). The STAPL Parallel Graph Library. Languages and Compilers for Parallel Computing (LCPC). https://doi.org/10.1007/978-3-642-37658-0_4
- Jacobs, S.A. & Amato, N.M. (2011). From Days to Seconds: Scalable Parallel Algorithms for Motion Planning. In ACM Student Research Compet, Conf. on High Performance Computing Networking, Storage and Analysis Companion Proceedings. View publication
- Tanase, G. , Buss, A. , Fidel, A. , Harshvardhan, , Papadopoulos, I. , Pearce, O. , Smith, T. , Thomas, N. , Xu, X. , Mourad, N. , Vu, J. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2011). The STAPL Parallel Container Framework. Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP) , 46(8) , 235–246. https://doi.org/10.1145/1941553.1941586
- Buss, A. , Fidel, A. , Harshvardhan, , Smith, T. , Tanase, G. , Thomas, N. , Xu, X. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2010). The STAPL pView. Languages and Compilers for Parallel Computing (LCPC). https://doi.org/10.1007/978-3-642-19595-2_18
- Buss, A. , Harshvardhan, , Papadopoulos, I. , Tkachyshyn, O. , Smith, T. , Tanase, G. , Thomas, N. , Xu, X. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2010). STAPL: Standard Template Adaptive Parallel Library. Proceedings of the 3rd Annual Haifa Experimental Systems Conference , 10. https://doi.org/10.1145/1815695.1815713
- Tanase, G. , Xu, X. , Buss, A. , Harshvardhan, , Papadopoulos, I. , Tkachyshyn, O. , Smith, T. , Thomas, N. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2009). The STAPL pList. Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing , 16–30. https://doi.org/10.1007/978-3-642-13374-9_2
- Buss, A. , Smith, T. , Tanase, G. , Thomas, N. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2008). Design for Interoperability in STAPL: pMatrices and Linear Algebra Algorithms. Languages and Compilers for Parallel Computing (LCPC 2008). Lecture Notes in Computer Science, vol 5335. Springer. https://doi.org/10.1007/978-3-540-89740-8_21
- Tanase, G. , Raman, C.(. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2007). Associative Parallel Containers In STAPL. Languages and Compilers for Parallel Computing (LCPC). https://doi.org/10.1007/978-3-540-85261-2_11
- Tanase, G. , Bianco, M. , Amato, N.M. , & Rauchwerger, L. (2007). The STAPL pArray. Proceedings of the 2007 Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA) , 73–80. https://doi.org/10.1145/1327171.1327180
- Jula, A. & Rauchwerger, L. (2006). Custom Memory Allocation for Free: Improving Data Locality with Container-Centric Memory Allocation. Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science , 299--313. https://doi.org/10.1007/978-3-540-72521-3_22
- Thomas, S. , Tanase, G. , Dale, L.K. , Moreira, J.E. , Rauchwerger, L. , & Amato, N.M. (2005). Parallel Protein Folding with STAPL. Concurrency and Computation: Practice and Experience , 17(14) , 1643-1656. https://doi.org/https://doi.org/10.1002/cpe.950
- Thomas, N. , Tanase, G. , Tkachyshyn, O. , Perdue, J. , Amato, N.M. , & Rauchwerger, L. (2005). A Framework for Adaptive Algorithm Selection in STAPL. Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , 277–288. https://doi.org/10.1145/1065944.1065981
- Thomas, S. & Amato, N.M. (2004). Parallel Protein Folding with STAPL. 18th International Parallel and Distributed Processing Symposium (IPDPS) , 189-. https://doi.org/10.1109/IPDPS.2004.1303204
- Saunders, S. & Rauchwerger, L. (2003). ARMI: An Adaptive, Platform Independent Communication Library. Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , 38(10) , 230–241. https://doi.org/10.1145/781498.781534
- Saunders, S. (2003). Object Oriented Abstractions for Communication in Parallel Programs. Masters Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University. View publication
- Saunders, S. & Rauchwerger, L. (2002). A parallel communication infrastructure for STAPL. Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL). View publication
- An, P. , Jula, A. , Rus, S. , Saunders, S. , Smith, T. , Tanase, G. , Thomas, N. , Amato, N. , & Rauchwerger, L. (2001). STAPL: An Adaptive, Generic Parallel C++ Library. Languages and Compilers for Parallel Computing (LCPC.) Lecture Notes in Computer Science , 193--208. https://doi.org/10.1007/3-540-35767-X_13
- An, P. , Jula, A. , Rus, S. , Saunders, S. , Smith, T. , Tanase, G. , Thomas, N. , Amato, N. , & Rauchwerger, L. (2001). STAPL: A standard template adaptive parallel C++ library. Int. Wkshp on Adv. Compiler Technology for High Perf. and Embedded Processors , 10. View publication
- Rauchwerger, L. , Arzu, F. , & Ouchi, K. (1998). Standard Templates Adaptive Parallel Library (STAPL). Languages, Compilers, and Run-Time Systems for Scalable Computers. https://doi.org/10.1007/3-540-49530-4_32