Report ID
1997-03
Report Authors
Cong Fu and Tao Yang
Report Date
Abstract
Automatic scheduling for directed acyclic graphs (DAG) and its applications forcoarse-grained irregular problems such as large n-body simulation have beenstudied in the literature. However solving irregular problems with mixedgranularities such as sparse matrix factorization is challenging since itrequires efficient run-time support to execute a DAG schedule. In this paper,we investigate run-time optimization techniques for executing generalasynchronous DAG schedules on distributed memory machines and discuss anapproach for exploiting parallelism from commuting operations in the DAGmodel. Our solution tightly integrates the run-time scheme with a fastcommunication mechanism to eliminate unnecessary overhead in message bufferingand copying. We present a consistency model incorporating the aboveoptimizations, and taking advantage of task dependence properties to ensure thecorrectness of execution. We demonstrate the applications of this scheme insparse matrix factorizations and triangular equation solving for which actualspeedups are hard to obtain. We provide a detailed experimental study on MeikoCS-2 to show that the automatically scheduled code has achieved goodperformance for these difficult problems and the run-time overhead is smallcompared to total execution times.
Document
1997-03.ps706.29 KB