This project focuses on the study of
scheduling algorithms and software systems for exploiting data, task and
loop parallelism
on distributed memory architectures and workstation clusters.
The fast scheduling algorithms we have developed provide effective
utilization of computing resources for directed acyclic graphs, iterative
task graphs with and without cycles, and task graphs with data parallelism.
The developed techniques can be used for performance prediction and
code optimization.
The main applications are
scientific computations such as sparse matrix factorization arising
from numerical solutions to nonlinear equations, adaptive n-body simulations
using the fast multipole method, and image processing.
We are developing a run-time system called RAPID
which integrates automatic scheduling techniques and efficient
communication schemes for irregular task computations with mixed
granularities on message-passing distributed memory machines.
The system provides a set of library functions
for specifying irregular data objects
and tasks that access these objects. It extracts a task dependence
graph from data access patterns, and executes tasks efficiently on a
distributed memory machine.
Our experimental results on Cray-T3D/T3E and Meiko CS-2 indicate
that the RAPID system obtains promising performance in sparse
matrix problems and n-body simulation.
In particular, using the RAPID system we have obtained
high megaflop rate for parallel sparse LU/Gaussian elimination
with partial pivoting, which is an open parallelization problem
in scientific computing literature.
We have optimized our sparse LU code with pivoting and have
achieved upto 11.04 GFLOPS on 128 Cray T3E nodes, which is the highest
megaflops performance in literature.
The previous record was
2.583 GFLOPS on a shared memory machine.
Recently we have also worked on
scheduling and runtime support
for clustering Web and digital library servers .
This project is supported in part by
NSF CAREER CCR-9702640 and
a DARPA grant (ONR Contract Number N6600197C8534).
Download RAPID1.0 for Cray-T3E.
Selected Publications on scheduling/run-time support and irregular/sparse
applications:
Scheduling and run-time support
-
T. Yang, and C. Fu,
Space/Time-Efficient Scheduling and
Execution of Parallel Irregular Computations,
To appear in
ACM Transactions on Programming Languages and Systems . 1999
- C. Fu and T. Yang.
Space and Time Efficient Execution of
Parallel Irregular Computation.
in ACM SIGPLAN Symposium on Principles & Practice of Parallel
Programming, Las Vegas, June, 1997.
Talk slides.
- C. Fu,
Scheduling and runtime support for Irregular Computations,
Ph.D Thesis, UCSB, Nov. 1997.
- C. Fu and T. Yang.
Run-time Techniques for Exploiting Irregular
Task Parallelism on Distributed Memory Architectures.
Accepted for publication by
Journal of Parallel and Distributed Computing, 1997.
- T. Yang and C. Fu.
Heuristic Algorithms for Scheduling Iterative
Task Graphs on Distributed Memory Machines. in IEEE
Transactions on Parallel and Distributed Systems, 1997.
-
C. Lee,Y.-F., Wang, T. Yang,
Global Optimization for Mapping Parallel Image Processing Tasks,
in Journal of Parallel and Distributed Computing}.
V. 45, 29-45, 1997.
- C. Fu and T. Yang,
Run-time Compilation for Parallel Sparse Matrix Computations,
in Proc. of the 10th ACM SIGARCH International Conference on
Supercomputing, Philadelphia, pp. 237-244, May, 1996.
Talk slides.
- C. Fu and T. Yang,
Efficient Run-time Support for Irregular Task Computations with Mixed
Granularities,
in Proc. of IPPS '96 - 10th Inter. Parallel Processing Symposium , IEEE.
Hawaii, pp. 823-830, April, 1996.
Talk slides.
- T. Yang, C. Fu, A. Gerasoulis and V. Sarkar,
Mapping
iterative task graphs on distributed-memory machines.
Proc. of 24th Inter. Conference on
Parallel Processing, Aug. 1995, Vol II, pp. 151-158.
- T. Yang, O. Ibarra,
Performance Prediction in Symbolic Scheduling of Partitioned
Programs with Weight Variation .
in Journal of Parallel and Distributed Computing, 1997.
A short version appears in Proceedings of IEEE SPDP'95.
Sparse matrix solvers and irregular applications
-
B. Jiang, S. Richman, K. Shen, and T. Yang,
Fast Sparse LU Factorization with Lazy Space Allocation.
To appear in
SIAM 1999 Parallel Processing Conference on Scientific Computing.
- Kai Shen, Xiangmin Jiao and Tao Yang,
Elimination Forest Guided 2D Sparse LU Factorization,
Proc. of
ACM Symp. on Parallel Architectures and Algorithms (SPAA'98) 5-15.
Talk slides.
A long version is the technical report
TRCS98-08.
- C. Fu, X. Jiao and T. Yang.
Efficient Sparse LU Factorization with Partial Pivoting
on Distributed Memory Architectures.
in IEEE Tran. on Parallel and Distributed Systems, Feb 1998.
- C. Fu, X. Jiao and T. Yang.
A Comparison of 1-D and 2-D Data
Mapping for Sparse LU Factorization on Distributed Memory Machines.
in 8th SIAM Conference on Parallel Processing for Scientific
Computing, Minneapolis, March, 1997.
Talk slides .
- C. Fu and T. Yang,
Sparse LU Factorization with Partial Pivoting
on Distributed Memory Machines.
in Proc. of ACM/IEEE SuperComputing'96,
November, 1996, Pittsburgh.
A long version is the technical report
TRCS96-18.
-
Y. Yu, O. Ibarra, T. Yang,
Parallel Progressive Radiosity with Adaptive Meshing ,
in the Journal of Parallel and Distributed Computing (JPDC), 1997.
A short version appeared in Lecture Notes in Computer Science,
Proc. of Irregular '96, Santa Barbara, Aug. 1996.
Current members:
Past members :
- Cong Fu (PhD, 1997. Currently with SUN MICRO)
- Xiangmin Jiao (MS 1997. Currently with UIUC)
- Bin Jiang (MS 1999. Currently with Environmental Systems Research Institute)
You are visitor No.
since February 5, 1997.