CV / Resume
Projects/Fun
- Spring '06 - My Roomba vacuum in little tiny pieces (currently 100% functional)
- 2003-2005 - developing a centralized management system for a number of FPS game servers
- Spring '00 - my old laptop in little tiny pieces (rebuilt successfully and available for use [though, why?])
- Summer '99 - My Explorer Bot - fun with Legos programming
- Summer '98 - Mesh00 - a java utility to decimate topologies (documentation available)
- Summer '98 - AddJavaDoc.java - a java utility to add javadoc templates to .java sources (source included)
- Summer '98 - MakeAdapters.java - a java callback method generator [I'm no fan of inner classes] (source included)
- Summer '97 - BB32CTF0 - a 3D computer model of the Harvey R. "Bum" Bright Building (Quake map included)
- Summer '97 - BMP2MAP - conversion program for creating BB32CTF0 (source included)
An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications, Ravi Iyer, Jack Perdue, Lawrence Rauchwerger, Nancy M. Amato and Laxmi Bhuyan, International Journal of Parallel Programming, Vol: 33, pp. 307-350, Aug 2005. DOI: 10.1007/s10766-004-1187-0 @article{iyer-ijpp-2005 As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of Ll/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of Processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the applicationâÂÂs execution time performance.
Keywords: High-Performance Computing, Parallel Processing, Scientific Computing
Links : [Published] BibTex
, author = "Ravi Iyer and Jack Perdue and Lawrence Rauchwerger and Nancy M. Amato and Laxmi Bhuyan"
, title = "An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications"
, journal = "Internaational Journal of Parallel Programming"
, volume = "33"
, pages = "307--350"
, year = "2005"
, doi = "10.1007/s10766-004-1187-0"
}
Abstract
A Framework for Adaptive Algorithm Selection in STAPL, Nathan Thomas, Gabriel Tanase, Olga Tkachyshyn, Jack Perdue, Nancy M. Amato, Lawrence Rauchwerger, Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 277-288, Chicago, IL, USA, Jun 2005. DOI: 10.1145/1065944.1065981 @inproceedings{10.1145/1065944.1065981, Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.
Keywords: Parallel Algorithms, Parallel Programming, STAPL
Links : [Published] BibTex
author = {Thomas, Nathan and Tanase, Gabriel and Tkachyshyn, Olga and Perdue, Jack and Amato, Nancy M. and Rauchwerger, Lawrence},
title = {A Framework for Adaptive Algorithm Selection in STAPL},
year = {2005},
isbn = {1595930809},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1065944.1065981},
doi = {10.1145/1065944.1065981},
abstract = {Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.},
booktitle = {Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {277–288},
numpages = {12},
keywords = {machine learning, parallel algorithms, matrix multiplication, sorting, adaptive algorithms},
location = {Chicago, IL, USA},
series = {PPoPP \'05}
}Abstract
Developing a Cost Model for Communication on a Symmetric Multiprocessor, Jack Perdue, Master Thesis, College Station, Texas, USA, Dec 1998. @mastersthesis{perdue-ms-1998 Despite numerous advances in hardware design, the full ics. potential of today's parallel processing systems is rarely realized due to the difficulty in predicting how a particular algorithm will utilize the underlying architecture. This study looks at a number of proposed "bridging models'' for the bulk-synchronous parallel processing paradigm that attempt to provide an accurate abstraction of how well an algorithm will exploit the underlying architecture. The study provides evidence that temporal and spatial locality of memory accesses must be considered if a bridging model is to be useful. Additionally, this study presents a pair of prediction functions that serve as indicators for best and worst case performance on a particular class of systems, symmetric multiprocessors -in particular, the SGI Power Challenge.
Keywords: High-Performance Computing, Parallel Processing, Scientific Computing
Links : [Published] [Manuscript] BibTex
, author = "Jack Perdue"
, title = "Developing a cost model for communication on a symmetric multiprocessor"
, school = "Department of Computer Science and Engineering, Texas A\&M University"
, year = "1998"
, month = "December"
, note = "http://hdl.handle.net/1969.1/ETD-TAMU-1998-THESIS-P47"
}
Abstract