CRPC-TR99809-S September 1999 Title: Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse Authors: Chen Ding and Ken Kennedy Submitted November 1999 Abstract: Reusing data in cache is critical to achieving high performance on modern machines, because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most studies of software memory hierarchy management have focused on the latenct problem in loops. However, today's machines are increasingly limited by the insufficient memory bandwidth -- latency-oriented techniques are inadequate because they do not seek to minimize the amount of daya transferred from memory over the whole program. To address the bandwidth limitation, this paper explores the potential for global cache reuse -- that is, reusing data across loops nests and over the entire program. In particular, the paper investigates a two-step strategy. The first step fuses computations on the same data to enable the caching of repeated accesses. The second step groups data used by the same computation to make them contiguous in memory. While the first step reduces the frequency of memory access, the second step improves its efficiency. The paper demonstrates the effectiveness of this strategy and shows how to automate it in a production compiler. ------------------------------------------------------------------------------ Chen Ding cding@rice.edu Department of Computer Science Rice University Ken Kennedy ken@rice.edu Department of Computer Science Rice University