Report ID
1995-21
Report Authors
Cong Fu and Tao Yang
Report Date
Abstract
Automatic scheduling for directed acyclic graphs (DAG) and its applications forcoarse-grained irregular problems such as large n-body simulation have beenstudied in the literature. However solving irregular problems with mixedgranularities such as sparse matrix factorization is challenging since itrequires efficient run-time support for executing the DAG schedule. In thispaper, we investigate run-time compilation and supporting techniques forexecuting general asynchronous DAG schedules on distributed memory machines.Our solution tightly integrates the run-time scheme with a fast communicationmechanism to eliminate unnecessary overhead in message buffering and copying,and takes advantage of task dependence properties to ensure the correctness ofexecution. We demonstrate the applications of this scheme in sparse Choleskyand LU factorizations for which actual speedups have been hard to obtain in theliterature. Our experiments on Meiko CS-2 show that the automaticallyscheduled code achieves scalable performance for these problems and therun-time overhead is small compared to the total execution time.
Document
1995-21.ps778.89 KB