STAPL Graph Library

The STAPL Graph Library (SGL) is a distributed-memory high-performance parallel graph processing framework written in C++ using STAPL. In addition to a graph data structure, SGL includes a collection of efficient parallel graph algorithms.

Related Publications

Algorithm-Level Optimizations for Scalable Parallel Graph Processing, Harshvardhan, Doctoral dissertation, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX, USA, May 2018.
Keywords: High-Performance Computing, Parallel Graph Algorithms, Parallel Libraries
Links : [Published]

BibTex

@phdthesis{CitekeyPhdthesis,
author = "Harshvardhan",
title = "Algorithm-Level Optimizations for Scalable Parallel Graph Processing",
school = "Texas A&M University",
address = "College Station, TX, USA",
year = 2018,
month = may
}


Abstract

Efficiently processing large graphs is challenging, since parallel graph algorithms suffer from poor scalability and performance due to many factors, including heavy communication and load-imbalance. Furthermore, it is difficult to express graph algorithms, as users need to understand and effectively utilize the underlying execution of the algorithm on the distributed system. The performance of graph algorithms depends not only on the characteristics of the system (such as latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree, mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense communication). The best execution strategy, therefore, often heavily depends on the combination of input graph, system and algorithm. Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to concentrate on a single vertex, making it easier to express parallel graph algorithms. However, this often loses information about the machine, making it difficult to extract performance and scalability from fine-grained algorithms. To address these issues, we present a model for expressing parallel graph algorithms using a fine-grained expression. Our model decouples the algorithm-writer from the underlying details of the system, graph, and execution and tuning of the algorithm. We also present various graph paradigms that optimize the execution of graph algorithms for various types of input graphs and systems. We show our model is general enough to allow graph algorithms to use the various graph paradigms for the best/fastest execution, and demonstrate good performance and scalability for various different graphs, algorithms, and systems to 100,000+ cores.


Bounded Asynchrony and Nested Parallelism for Scalable Graph Processing, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. Supercomputing (SC), Doctoral Showcase Poster, Denver, CO, Nov 2017.
Keywords: Approximate Algorithms, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

N/A


Abstract

Processing large-scale graphs has become a critical component in a variety of fields, from scientific computing to social analytics. The irregular access pattern for graph workloads, coupled with complex graph structures and large data sizes makes efficiently executing parallel graph workloads challenging. In this dissertation, we develop two broad techniques for improving the performance of general parallel graph algorithms: bounded asynchrony and nested parallelism. Increasing asynchrony in a bounded manner allows one to avoid costly global synchronization at scale, while still avoiding the penalty of unbounded asynchrony including redundant work. Using this technique, we scale a BFS workload to 98,304 cores where traditional methods stop scaling. Additionally, asynchrony enables a new family of approximate algorithms for applications tolerant to fixed amounts of error. Representing graph algorithms in a nested parallel manner enables the full use of available parallelism inherent in graph algorithms, while efficiently managing communication through nested sections.


Fast Approximate Distance Queries in Unweighted Graphs using Bounded Asynchrony, Adam Fidel, Francisco Coral, Colton Riedel, Nancy M. Amato, Lawrence Rauchwerger, Workshop on Languages and Compilers for Parallel Computing (LCPC 2016). Lecture Notes in Computer Science, vol 10136. Springer, Cham., pp. 40-54, Rochester NY, USA, Jan 2017. DOI: 10.1007/978-3-319-52709-3_4
Keywords: Approximate Algorithms, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@inproceedings{
author="Adam Fidel and Francisco Coral Sabido and Colton Riedel and Nancy M. Amato and Lawrence Rauchwerger",
title="Fast Approximate Distance Queries in Unweighted Graphs using Bounded Asynchrony",
year=2017,
booktitle="Languages and Compilers for Parallel Computing (LCPC 2016)",
series="Lecture Notes in Computer Science",
volume="10136",
publisher="Springer, Cham",
note = "DOI: https://doi.org/10.1007/978-3-319-52709-3_4"
}


Abstract

We introduce a new parallel algorithm for approximate breadth-first ordering of an unweighted graph by using bounded asynchrony to parametrically control both the performance and error of the algorithm. This work is based on the k-level asynchronous (KLA) paradigm that trades expensive global synchronizations in the level-synchronous model for local synchronizations in the asynchronous model, which may result in redundant work. Instead of correcting errors introduced by asynchrony and redoing work as in KLA, in this work we control the amount of work that is redone and thus the amount of error allowed, leading to higher performance at the expense of a loss of precision. Results of an implementation of this algorithm are presented on up to 32,768 cores, showing 2.27x improvement over the exact KLA algorithm and 3.8x improvement over the level-synchronous version with minimal error on several graph inputs.


Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory, Ioannis Papadopoulos, Nathan Thomas, Adam Fidel, Dielli Hoxha, Nancy M. Amato, Lawrence Rauchwerger, In Wkshp. on Lang. and Comp. for Par. Comp. (LCPC), pp. 106-121, Chapel Hill, NC, Feb 2016. DOI: 10.1007/978-3-319-29778-1_7
Keywords: Runtime Systems, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-319-29778-1_7,
author="Papadopoulos, Ioannis
and Thomas, Nathan
and Fidel, Adam
and Hoxha, Dielli
and Amato, Nancy M.
and Rauchwerger, Lawrence",
editor="Shen, Xipeng
and Mueller, Frank
and Tuck, James",
title="Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory",
booktitle="Languages and Compilers for Parallel Computing",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="106--121",
abstract="Nested parallelism is of increasing interest for both expressivity and performance. Many problems are naturally expressed with this divide-and-conquer software design approach. In addition, programmers with target architecture knowledge employ nested parallelism for performance, imposing a hierarchy in the application to increase locality and resource utilization, often at the cost of implementation complexity.",
isbn="978-3-319-29778-1"
}


Abstract

Nested parallelism is of increasing interest for both expressivity and performance. Many problems are naturally expressed with this divide-and-conquer software design approach. In addition, programmers with target architecture knowledge employ nested parallelism for performance, imposing a hierarchy in the application to increase locality and resource utilization, often at the cost of implementation complexity. While dynamic applications are a natural fit for the approach, support for nested parallelism in distributed systems is generally limited to well-structured applications engineered with distinct phases of intra-node computation and inter-node communication. This model makes expressing irregular applications difficult and also hurts performance by introducing unnecessary latency and synchronizations. In this paper we describe an approach to asynchronous nested parallelism which provides uniform treatment of nested computation across distributed memory. This approach allows efficient execution while supporting dynamic applications which cannot be mapped onto the machine in the rigid manner of regular applications. We use several graph algorithms as examples to demonstrate our library’s expressivity, flexibility, and performance.


An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms (Conference Best Paper Finalist), Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 201-212, San Francisco, CA, Oct 2015. DOI: 10.1109/PACT.2015.34
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@INPROCEEDINGS{7429306,
author={ {Harshvardhan} and A. {Fidel} and N. M. {Amato} and L. {Rauchwerger}},
booktitle={2015 International Conference on Parallel Architecture and Compilation (PACT)},
title={An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms},
year={2015},
volume={},
number={},
pages={201-212},}


Abstract

Graph algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. This work presents an approach to transparently (without programmer intervention) allow fine-grained graph algorithms to utilize algorithmic communication reduction optimizations. In many graph algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithm-aware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained graph algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131,000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various graph mining and graph analytics algorithms.


A Hybrid Approach To Processing Big Data Graphs on Memory-Restricted Systems, Harshvardhan, Brandon West, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), pp. 799-808, Hyderabad, India, May 2015. DOI: 10.1109/IPDPS.2015.28
Keywords: Big Data, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@INPROCEEDINGS{7161566, author={ {Harshvardhan} and B. {West} and A. {Fidel} and N. M. {Amato} and L. {Rauchwerger}}, booktitle={2015 IEEE International Parallel and Distributed Processing Symposium}, title={A Hybrid Approach to Processing Big Data Graphs on Memory-Restricted Systems}, year={2015}, volume={}, number={}, pages={799-808},}


Abstract

With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques that can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into sub graphs that fit in RAM and uses a paging-like technique to load sub graphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.


A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms, Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger, In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 285-286, San Francisco, CA, USA, Feb 2015. DOI: 10.1145/2688500.2700994
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2688500.2700994,
author = {Harshvardhan and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms},
year = {2015},
isbn = {9781450332057},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2688500.2700994},
doi = {10.1145/2688500.2700994},
abstract = { Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5\texttimes{}-8\texttimes{} improvement for a variety of graph algorithms at 12,000+ cores. },
booktitle = {Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {285–286},
numpages = {2},
keywords = {Distributed Computing, Graph Analytics, Parallel Graph Processing, Big Data},
location = {San Francisco, CA, USA},
series = {PPoPP 2015}
}

@article{10.1145/2858788.2700994,
author = {Harshvardhan and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Hierarchical Approach to Reducing Communication in Parallel Graph Algorithms},
year = {2015},
issue_date = {August 2015},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {50},
number = {8},
issn = {0362-1340},
url = {https://doi.org/10.1145/2858788.2700994},
doi = {10.1145/2858788.2700994},
abstract = { Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5\texttimes{}-8\texttimes{} improvement for a variety of graph algorithms at 12,000+ cores. },
journal = {SIGPLAN Not.},
month = jan,
pages = {285–286},
numpages = {2},
keywords = {Distributed Computing, Parallel Graph Processing, Big Data, Graph Analytics}
}


Abstract

Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5×-8× improvement for a variety of graph algorithms at 12,000+ cores.


Efficient, Reachability-based, Parallel Algorithms for Finding Strongly Connected Components, Daniel Tomkins, Timmie Smith, Nancy M. Amato, Lawrence Rauchwerger, Technical Report, TR15-002, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, Jan 2015.
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

N/A


Abstract

Large, complex graphs are increasingly used to represent unstructured data in scientific applications. Applications such as discrete ordinates methods for radiation transport require these graphs to be directed with all cycles eliminated, where cycles are identified by finding the graph.s strongly connected components (SCCs). Deterministic parallel algorithms identify SCCs in polylogarithmic time, but the work of the algorithms on sparse graphs exceeds the linear sequential complexity bound. Randomized algorithms based on reachability — the ability to get from one vertex to another along a directed path — are generally favorable for sparse graphs, but either do not perform well on all graphs or introduce substantial overheads. This paper introduces two new algorithms, SCCMulti and SCCMulti-PriorityAware, which perform well on all graphs without significant overhead. We also provide a characterization that compares reachabilitybased SCC algorithms. Experimental results demonstrate scalability for both algorithms to 393,216 of cores on several types of graphs.


Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates, Roger Pearce, Maya Gokhale, Nancy M. Amato, International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 549-559, New Orleans, Louisiana, USA, Nov 2014. DOI: 10.1109/SC.2014.50
Keywords: Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{DBLP:conf/sc/PearceGA14,
author = {Roger A. Pearce and
Maya B. Gokhale and
Nancy M. Amato},
editor = {Trish Damkroger and
Jack J. Dongarra},
title = {Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with
Vertex Delegates},
booktitle = {International Conference for High Performance Computing, Networking,
Storage and Analysis, {SC} 2014, New Orleans, LA, USA, November 16-21,
2014},
pages = {549--559},
publisher = {{IEEE} Computer Society},
year = {2014},
url = {https://doi.org/10.1109/SC.2014.50},
doi = {10.1109/SC.2014.50},
timestamp = {Mon, 20 Apr 2020 17:14:43 +0200},
biburl = {https://dblp.org/rec/conf/sc/PearceGA14.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}


Abstract

At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%.


KLA: A New Algorithmic Paradigm for Parallel Graph Computations (Conference Best Paper), Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 27-38, Edmonton, AB, Canada, Aug 2014. DOI: 10.1145/2628071.2628091
Keywords: Parallel Algorithms, Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/2628071.2628091,
author = {Harshvardhan and Fidel, Adam and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {KLA: A New Algorithmic Paradigm for Parallel Graph Computations},
year = {2014},
isbn = {9781450328098},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2628071.2628091},
doi = {10.1145/2628071.2628091},
abstract = {This paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. The KLA paradigm enables the level of asynchrony in parallel graph algorithms to be parametrically varied from none (level-synchronous) to full (asynchronous). The motivation is to improve execution times through an appropriate trade-off between the use of fewer, but more expensive global synchronizations, as in level-synchronous algorithms, and more, but less expensive local synchronizations (and perhaps also redundant work), as in asynchronous algorithms. We show how common patterns in graph algorithms can be expressed in the KLA pardigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the STAPL Graph Library show excellent scalability on up to 96K cores and improvements of 10x or more over level-synchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs.},
booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation},
pages = {27–38},
numpages = {12},
keywords = {parallel algorithms, big data, distributed computing, graph analytics, asynchronous graph algorithms},
location = {Edmonton, AB, Canada},
series = {PACT '14}
}


Abstract

This paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. The KLA paradigm enables the level of asynchrony in parallel graph algorithms to be parametrically varied from none (level-synchronous) to full (asynchronous). The motivation is to improve execution times through an appropriate trade-off between the use of fewer, but more expensive global synchronizations, as in level-synchronous algorithms, and more, but less expensive local synchronizations (and perhaps also redundant work), as in asynchronous algorithms. We show how common patterns in graph algorithms can be expressed in the KLA pardigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the STAPL Graph Library show excellent scalability on up to 96K cores and improvements of 10× or more over level-synchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs.


Processing Big Data Graphs on Memory-Restricted Systems, Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger, In Proc. IEEE Int.Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 517-518, Edmonton, AB, Canada, Aug 2014. DOI: 10.1145/2628071.2671429
Keywords: Big Data, Out-Of-Core Graph Algorithms, Parallel Graph Algorithms
Links : [Published]

BibTex

@inproceedings{10.1145/2628071.2671429,
author = {Harshvardhan and Amato, Nancy M. and Rauchweger, Lawrence},
title = {Processing Big Data Graphs on Memory-Restricted Systems},
year = {2014},
isbn = {9781450328098},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2628071.2671429},
doi = {10.1145/2628071.2671429},
abstract = {With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques, which can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance.In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into subgraphs that fit in RAM and uses a paging-like technique to load subgraphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16,000+ cores.},
booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation},
pages = {517–518},
numpages = {2},
keywords = {out-of-core graph algorithms, distributed computing, graph analytics, big data, parallel graph processing},
location = {Edmonton, AB, Canada},
series = {PACT '14}
}


Abstract

With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques, which can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. Contribution. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into subgraphs that fit in RAM and uses a paging-like technique to load subgraphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.


The STAPL Parallel Graph Library, Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger, Languages and Compilers for Parallel Computing (LCPC), pp. 46-60, Tokyo, Japan, Sep 2012. DOI: 10.1007/978-3-642-37658-0_4
Keywords: Parallel Graph Algorithms, STAPL
Links : [Published]

BibTex

@InProceedings{10.1007/978-3-642-37658-0_4,
author=\"Harshvardhan
and Fidel, Adam
and Amato, Nancy M.
and Rauchwerger, Lawrence\",
editor=\"Kasahara, Hironori
and Kimura, Keiji\",
title=\"The STAPL Parallel Graph Library\",
booktitle=\"Languages and Compilers for Parallel Computing\",
year=\"2013\",
publisher=\"Springer Berlin Heidelberg\",
address=\"Berlin, Heidelberg\",
pages=\"46--60\",
abstract=\"This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraphpViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges.\",
isbn=\"978-3-642-37658-0\"
}


Abstract

This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges.