RAPID: Scheduling and Run-Time Support for Parallel Irregular Computations.

S+: Fast Sparse Gaussian Elimination (LU Factorization) Code

This project focuses on the study of scheduling algorithms and software systems for exploiting data, task and loop parallelism on distributed memory architectures and workstation clusters. The fast scheduling algorithms we have developed provide effective utilization of computing resources for directed acyclic graphs, iterative task graphs with and without cycles, and task graphs with data parallelism. The developed techniques can be used for performance prediction and code optimization. The main applications are scientific computations such as sparse matrix factorization arising from numerical solutions to nonlinear equations, adaptive n-body simulations using the fast multipole method, and image processing.
We are developing a run-time system called RAPID which integrates automatic scheduling techniques and efficient communication schemes for irregular task computations with mixed granularities on message-passing distributed memory machines. The system provides a set of library functions for specifying irregular data objects and tasks that access these objects. It extracts a task dependence graph from data access patterns, and executes tasks efficiently on a distributed memory machine.
Our experimental results on Cray-T3D/T3E and Meiko CS-2 indicate that the RAPID system obtains promising performance in sparse matrix problems and n-body simulation. In particular, using the RAPID system we have obtained high megaflop rate for parallel sparse LU/Gaussian elimination with partial pivoting, which is an open parallelization problem in scientific computing literature. We have optimized our sparse LU code with pivoting and have achieved upto 11.04 GFLOPS on 128 Cray T3E nodes, which is the highest megaflops performance in literature. The previous record was 2.583 GFLOPS on a shared memory machine.
Recently we have also worked on scheduling and runtime support for clustering Web and digital library servers .
This project is supported in part by NSF CAREER CCR-9702640 and a DARPA grant (ONR Contract Number N6600197C8534).

Download RAPID1.0 for Cray-T3E.

Selected Publications on scheduling/run-time support and irregular/sparse applications:

Scheduling and run-time support

T. Yang, and C. Fu, Space/Time-Efficient Scheduling and Execution of Parallel Irregular Computations, To appear in ACM Transactions on Programming Languages and Systems . 1999
C. Fu and T. Yang. Space and Time Efficient Execution of Parallel Irregular Computation. in ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Las Vegas, June, 1997. Talk slides.
C. Fu, Scheduling and runtime support for Irregular Computations, Ph.D Thesis, UCSB, Nov. 1997.
C. Fu and T. Yang. Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures. Accepted for publication by Journal of Parallel and Distributed Computing, 1997.
T. Yang and C. Fu. Heuristic Algorithms for Scheduling Iterative Task Graphs on Distributed Memory Machines. in IEEE Transactions on Parallel and Distributed Systems, 1997.
C. Lee,Y.-F., Wang, T. Yang, Global Optimization for Mapping Parallel Image Processing Tasks, in Journal of Parallel and Distributed Computing}. V. 45, 29-45, 1997.
C. Fu and T. Yang, Run-time Compilation for Parallel Sparse Matrix Computations, in Proc. of the 10th ACM SIGARCH International Conference on Supercomputing, Philadelphia, pp. 237-244, May, 1996. Talk slides.
C. Fu and T. Yang, Efficient Run-time Support for Irregular Task Computations with Mixed Granularities, in Proc. of IPPS '96 - 10th Inter. Parallel Processing Symposium , IEEE. Hawaii, pp. 823-830, April, 1996. Talk slides.
T. Yang, C. Fu, A. Gerasoulis and V. Sarkar, Mapping iterative task graphs on distributed-memory machines. Proc. of 24th Inter. Conference on Parallel Processing, Aug. 1995, Vol II, pp. 151-158.
T. Yang, O. Ibarra, Performance Prediction in Symbolic Scheduling of Partitioned Programs with Weight Variation . in Journal of Parallel and Distributed Computing, 1997. A short version appears in Proceedings of IEEE SPDP'95.

Sparse matrix solvers and irregular applications

B. Jiang, S. Richman, K. Shen, and T. Yang, Fast Sparse LU Factorization with Lazy Space Allocation. To appear in SIAM 1999 Parallel Processing Conference on Scientific Computing.
Kai Shen, Xiangmin Jiao and Tao Yang, Elimination Forest Guided 2D Sparse LU Factorization, Proc. of ACM Symp. on Parallel Architectures and Algorithms (SPAA'98) 5-15. Talk slides.
A long version is the technical report TRCS98-08.
C. Fu, X. Jiao and T. Yang. Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures. in IEEE Tran. on Parallel and Distributed Systems, Feb 1998.
C. Fu, X. Jiao and T. Yang. A Comparison of 1-D and 2-D Data Mapping for Sparse LU Factorization on Distributed Memory Machines. in 8th SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, March, 1997. Talk slides .
C. Fu and T. Yang, Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines. in Proc. of ACM/IEEE SuperComputing'96, November, 1996, Pittsburgh. A long version is the technical report TRCS96-18.
Y. Yu, O. Ibarra, T. Yang, Parallel Progressive Radiosity with Adaptive Meshing , in the Journal of Parallel and Distributed Computing (JPDC), 1997. A short version appeared in Lecture Notes in Computer Science, Proc. of Irregular '96, Santa Barbara, Aug. 1996.

More publications on scheduling and runtime support

Current members:

Steven Richman (joy@cs.ucsb.edu)

Kai Shen (kshen@cs.ucsb.edu)

Tao Yang (tyang@cs.ucsb.edu)

Past members :

Cong Fu (PhD, 1997. Currently with SUN MICRO)

Xiangmin Jiao (MS 1997. Currently with UIUC)

Bin Jiang (MS 1999. Currently with Environmental Systems Research Institute)

You are visitor No. since February 5, 1997.

RAPID: Scheduling and Run-Time Support for Parallel Irregular Computations.

S+: Fast Sparse Gaussian Elimination (LU Factorization) Code

Download RAPID1.0 for Cray-T3E.

Selected Publications on scheduling/run-time support and irregular/sparse applications:

Scheduling and run-time support

Sparse matrix solvers and irregular applications

More publications on scheduling and runtime support

Current members:

Steven Richman (joy@cs.ucsb.edu)

Kai Shen (kshen@cs.ucsb.edu)

Tao Yang (tyang@cs.ucsb.edu)

Past members :

Cong Fu (PhD, 1997. Currently with SUN MICRO)

Xiangmin Jiao (MS 1997. Currently with UIUC)

Bin Jiang (MS 1999. Currently with Environmental Systems Research Institute)

You are visitor No. since February 5, 1997.

RAPID: Scheduling and Run-Time Support for Parallel Irregular Computations. S+: Fast Sparse Gaussian Elimination (LU Factorization) Code

Download RAPID1.0 for Cray-T3E.

Selected Publications on scheduling/run-time support and irregular/sparse applications:

Scheduling and run-time support

Sparse matrix solvers and irregular applications

More publications on scheduling and runtime support

Current members: Steven Richman (joy@cs.ucsb.edu) Kai Shen (kshen@cs.ucsb.edu) Tao Yang (tyang@cs.ucsb.edu)

Past members : Cong Fu (PhD, 1997. Currently with SUN MICRO) Xiangmin Jiao (MS 1997. Currently with UIUC) Bin Jiang (MS 1999. Currently with Environmental Systems Research Institute)

You are visitor No. since February 5, 1997.

RAPID: Scheduling and Run-Time Support for Parallel Irregular Computations.

S+: Fast Sparse Gaussian Elimination (LU Factorization) Code

Current members:

Steven Richman (joy@cs.ucsb.edu)

Kai Shen (kshen@cs.ucsb.edu)

Tao Yang (tyang@cs.ucsb.edu)

Past members :

Cong Fu (PhD, 1997. Currently with SUN MICRO)

Xiangmin Jiao (MS 1997. Currently with UIUC)

Bin Jiang (MS 1999. Currently with Environmental Systems Research Institute)