CRPC-TR99808-S September 1999 Title: Memory Bandwidth Bottleneck and Its Amelioration by a Compiler Authors: Chen Ding and Ken Kennedy Submitted November 1999 Abstract: As the speed gap between CPU and memory widens, memory heirarchy has become the primary factor limiting program performance. Until now, the principal focus of hardware and software innovations has been overcoming latency. However, the advent of latency tolerance techniques such as non-blocking cache and software prefetching begins the process of trading bandwidth for latency by overlapping and pipelining memory transfers. A direct consequence of such parallel memory transfers is the increased consumtion of memory bandwidth. Since actual latency is the inverse of the consumed bandwith, memory latency cannot be fully tlerated without infinite bandwidth. This perspective has led us to several intriguing questions. How much data bandwidth a programm actually needs? Do current machines provide sufficient bandwidth? If not, can a program be restructured to consume less bandwidth? How different is bandwidth reduction from traditionally studied problem of latency reduction? This paper answers these questions in two parts. The first part measures the demand and supply of data bandwidth through a new performance model and demonstrates the serious performance constraint to the lack of memory bandwidth. The second part studies the problem of bandwidth reduction including the need for writeback elimination. A new set of compiler techniques are then proposed to minimize the overall memory transfer of a program. ------------------------------------------------------------------------------ Chen Ding cding@rice.edu Department of Computer Science Rice University Ken Kennedy ken@rice.edu Department of Computer Science Rice University