Nathaniel McIntosh In this paper we present a comprehensive compiler framework for improving the efficiency of compiler-directed software prefetching on cache-coherent distributed shared-memory multiprocessors. The key component of our work is a form of global data-flow analysis that predicts at compile-time the sets of array references that are likely to cause coherence activity at run-time. The data-flow framework accurately analyzes the cache behavior in a parallel program by combining array section analysis with knowledge about the cache configuration and an encoding of the target machine's cache coherence protocol. Existing prefetching algorithms have problems issuing prefetches for coherence misses, resulting in late prefetches and latency penalties. Our compiler identifies the particular variable references and loop iterations that cause coherence misses, and schedules prefetches for these references farther in advance, effectively hiding the latency that they incur. In other situations where existing prefetching techniques encounter difficulties, such as false sharing and many-procesor read sharing, we use data-flow information to apply optimizations that decrease interconnect traffic and reduce the memory latency penalties incurred by the program.