Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking

Brian Greskamp and Josep Torrellas
University of Illinois


Abstract

Manufacturers determine clock frequencies for their high-performance microprocessors under conservative worst-case assumptions. This allows them to guarantee that their products will operate error-free even under extreme conditions. On the other hand, they incur a performance penalty in the nominal case, where a chip can safely run much faster than the rated clock speed. As process technology scales, worsening process variation, temperature variation, and aging effects conspire to increase the frequency safety margins. Consequently, the gap between nominal-case frequency and worst-case frequency is increasing --- to the point that we predict a 30\% difference between nominal-case and rated (worst-case) frequencies in 32nm technology.

"Paceline" is a master-checker scheme that recovers the performance lost to conservative ratings by using two cores of a CMP to redundantly execute a single thread. The master is clocked at the nominal frequency (up to 1.3x faster than the rated frequency), while the checker runs at the rated, safe frequency. The master passes branch results and prefetch information to the checker thread, allowing it to keep up with the accelerated master. Comprehensive error detection and recovery ensures that a Paceline system has an error rate comparable to that of a traditional dual modular redundant (DMR) processor while attaining a 21% speedup on SPECint and a 9% speedup on SPECfp. Moreover, Paceline achieves these gains without significantly increasing power density or design effort.