Home Page for Jack Perdue | Parasol Laboratory


Picture+ Jack Perdue
MS 1998, Texas A&M

Parasol Laboratory url: http://parasollab.web.illinois.edu/~jack/
Department of Computer Science email:
University of Illinois at Urbana-Champaign
Urbana, IL 61801, USA


CV / Resume

Projects/Fun



Publications

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications, Ravi Iyer, Jack Perdue, Lawrence Rauchwerger, Nancy M. Amato and Laxmi Bhuyan, International Journal of Parallel Programming, Vol: 33, pp. 307-350, Aug 2005. DOI: 10.1007/s10766-004-1187-0
Keywords: High-Performance Computing, Parallel Processing, Scientific Computing
Links : [Published]

BibTex

@article{iyer-ijpp-2005
, author = "Ravi Iyer and Jack Perdue and Lawrence Rauchwerger and Nancy M. Amato and Laxmi Bhuyan"
, title = "An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications"
, journal = "Internaational Journal of Parallel Programming"
, volume = "33"
, pages = "307--350"
, year = "2005"
, doi = "10.1007/s10766-004-1187-0"
}


Abstract

As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of Ll/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of Processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the application’s execution time performance.


A Framework for Adaptive Algorithm Selection in STAPL, Nathan Thomas, Gabriel Tanase, Olga Tkachyshyn, Jack Perdue, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 277-288, Chicago, IL, USA, Jun 2005. DOI: 10.1145/1065944.1065981
Keywords: Parallel Algorithms, Parallel Programming, STAPL
Links : [Published]

BibTex

@inproceedings{10.1145/1065944.1065981,
author = {Thomas, Nathan and Tanase, Gabriel and Tkachyshyn, Olga and Perdue, Jack and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Framework for Adaptive Algorithm Selection in STAPL},
year = {2005},
isbn = {1595930809},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1065944.1065981},
doi = {10.1145/1065944.1065981},
abstract = {Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.},
booktitle = {Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {277–288},
numpages = {12},
keywords = {machine learning, parallel algorithms, matrix multiplication, sorting, adaptive algorithms},
location = {Chicago, IL, USA},
series = {PPoPP \'05}
}


Abstract

Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.


Developing a Cost Model for Communication on a Symmetric Multiprocessor, Jack Perdue, Master Thesis, College Station, Texas, USA, Dec 1998.
Keywords: High-Performance Computing, Parallel Processing, Scientific Computing
Links : [Published] [Manuscript]

BibTex

@mastersthesis{perdue-ms-1998
, author = "Jack Perdue"
, title = "Developing a cost model for communication on a symmetric multiprocessor"
, school = "Department of Computer Science and Engineering, Texas A\&M University"
, year = "1998"
, month = "December"
, note = "http://hdl.handle.net/1969.1/ETD-TAMU-1998-THESIS-P47"
}


Abstract

Despite numerous advances in hardware design, the full ics. potential of today's parallel processing systems is rarely realized due to the difficulty in predicting how a particular algorithm will utilize the underlying architecture. This study looks at a number of proposed "bridging models'' for the bulk-synchronous parallel processing paradigm that attempt to provide an accurate abstraction of how well an algorithm will exploit the underlying architecture. The study provides evidence that temporal and spatial locality of memory accesses must be considered if a bridging model is to be useful. Additionally, this study presents a pair of prediction functions that serve as indicators for best and worst case performance on a particular class of systems, symmetric multiprocessors -in particular, the SGI Power Challenge.