CRPC-TR97758 September 1997 Title: A New Parallel Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers Author: Jaeyoung Choi Submitted August 1998 Abstract: We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Independent Matrix Multiplication Algorithm), for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer. ------------------------------------------------------------------------------ Jaeyoung Choi School of Computing Soongsil University