CRPC-TR98742-S Revised November 1999 Title: Bandwith-Based Performance Tuning and Prediction Authors: Chen Ding and Ken Kennedy Submitted November 1999 Abstract: As the speed gap widens between CPU and memory, memory hierarchy performance has become the bottleneck for most applications. This is due in part to the difficulty of fully utilizing the deep and complex memory hierarchies found on most modern machines. In the past, various tools on performance tuning and prediction have been developed to improve machine utilization. However, these tools are not effective in practice because they either do not consider memory hierarchy zor do so with expensive and machine-specific program simulations. In this paper, we first demonstrate that application performance is now primarily limited by memory bandwidth. With this observation, we describe a new approach based on estimating and monitoring memory bandwidth consumption, which can achieve accurate and efficient performance tuning and prediction. When evaluated on a 3000-line benchmark program, NAS/SP, the bandwidth-based method has enabled a user to obtain a speedup of 1.19 by inspecting and tuning only 5% of the source code. Furthermore its compile-time prediction of overall execution time was within 10% of the actual running time. ------------------------------------------------------------------------------- Chen Ding Ken Kennedy cding@cs.rice.edu ken@cs.rice.edu Department of Computer Science Rice University