CRPC-TR99808-S						September 1999

Title: Memory Bandwidth Bottleneck and Its Amelioration by a Compiler

Authors: Chen Ding and Ken Kennedy

Submitted November 1999

Abstract:

     As the speed gap between CPU and memory widens, memory heirarchy has
become the primary factor limiting program performance.  Until now, the
principal focus of hardware and software innovations has been overcoming
latency.  However, the advent of latency tolerance techniques such as
non-blocking cache and software prefetching begins the process of trading
bandwidth for latency by overlapping and pipelining memory transfers.  A
direct consequence of such parallel memory transfers is the increased
consumtion of memory bandwidth.  Since actual latency is the inverse of
the consumed bandwith, memory latency cannot be fully tlerated without
infinite bandwidth.  This perspective has led us to several intriguing
questions.  How much data bandwidth a programm actually needs?  Do current
machines provide sufficient bandwidth?  If not, can a program be
restructured to consume less bandwidth?  How different is bandwidth
reduction from traditionally studied problem of latency reduction?  This
paper answers these questions in two parts.  The first part measures  the
demand and supply of data bandwidth through a new performance model and
demonstrates the serious performance constraint to the lack of memory
bandwidth.  The second part studies the problem of bandwidth reduction
including the need for writeback elimination.  A new set of compiler
techniques are then proposed to minimize the overall memory transfer of a
program.

------------------------------------------------------------------------------

Chen Ding
cding@rice.edu
Department of Computer Science
Rice University

Ken Kennedy
ken@rice.edu
Department of Computer Science
Rice University