Cache Optimization for Coarse Grain Task
Parallel Processing using Inter-Array Padding
Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara
To appear at
16th Workshop on Languages and Compilers for Parallel Computing (LCPC03), College Station, TX, 2-4 October 2003
Full Text, Printable Abstract.
Abstract
The wide use of multiprocessor system has been making automatic
parallelizing compilers more important. To improve the performance of
multiprocessor system more by compiler, multigrain parallelization is
important. In multigrain parallelization, coarse grain task
parallelism among loops and subroutines and near fine grain
parallelism among statements are used in addition to the traditional
loop parallelism. In addition, locality optimization to use cache
effectively is also impor tant for the performance improvement. This
paper describes inter-array padding to minimize cache conflict misses
among macro-tasks with data localization scheme which decomposes loops
sharing the same arrays to fit cache size and executes the decomposed
loops consecutively on the same processor. In the performance
evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed
scheme is implemented gave us 2.5 times speedup against the maximum
performance of Sun Forte compiler automatic loop parallelization at
the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs.
Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000
44p-270(4pe) against XLF compiler.