CRPC-TR98751 December 1997 Title: Automatically Tuned Linear Algebra Software Authors: R. Clint Whaley and Jack J. Dongarra Submitted August 1998; Available as LAPACK Working Note 131 Abstract: This paper describes an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and time consuming process. The work described here can help in automating much of the process. We will concetrate our efforts on the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS). In particular, the work presented here is for general matrix multiply, DGEMM. However, much of the technology and approach developed here can be applied to the other Level 3 BLAS and the general strategy can have an impact on basic linear algebra operations in general and may be extended to other important kernel operations. ------------------------------------------------------------------------------ R. Clint Whaley Jack J. Dongarra rcwhaley@cs.utk.edu dongarra@cs.utk.edu Computer Science Department University of Tennessee - Knoxville