S+: Fast Sparse Gaussian Elimination (LU Factorization)
Sparse LU factorization with partial pivoting is important for many scientific
applications and delivering high performance for this problem is difficult on
distributed memory machines. This project studies the properties of elimination
forests and uses them to guide supernode partitioning/amalgamation and execution
scheduling. This design with 2D mapping effectively identifies dense structures
without introducing too many zeros in the BLAS computation and exploits
asynchronous parallelism with low buffer space cost. The implementation of this
code, called S+, uses supernodal matrix multiplication which retains the BLAS-3
level efficiency and avoids unnecessary arithmetic operations. This project also
studies two space optimization techniques which can greatly improve the worst-case
performance of static symbolic factorization.The experiments
show that S+ can achieve up to 10.85GFLOPS on 128 Cray T3E 450MHz nodes, which
is the highest performance reported in the literature. The previous record was
2.583 GFLOPS
on a shared memory machine. This project is currently supported in part by NSF
CAREER CCR-9702640.
S+ has also been tested and packaged by Sun Microsystems and is released as part
of Sun HPC ClusterTools 4 Software.
Download S+ 1.0. Please cite
this if you use S+ in your work.
S+ 1.0 runs on SGI Origin 2000/Power Challenge and
Cray T3E. This release uses MPI and BLAS library. You may port to another machine
which also supports MPI and BLAS (please let us know if you have done that
successfully or you have application experience with S+).
Selected Publications on
Fast Sparse Gaussian Elimination (LU Factorization):
- K. Shen, T. Yang, and X. Jiao,
S+: Efficient 2D Sparse LU Factorization on Parallel Machines.
In SIAM Journal on Matrix Analysis and Applications (SIMAX) ,
Volume 22, Number 1, Pages 282-305, 2000.
- B. Jiang, S. Richman, K. Shen, and T. Yang,
Fast Sparse LU Factorization with Lazy Space Allocation.
In Proc. of SIAM 1999 Parallel Processing Conference on Scientific Computing. ,
San Antonio TX, March 1999.
Talk slides.
- K. Shen, X. Jiao and T. Yang,
Elimination Forest Guided 2D Sparse LU Factorization.
In Proc. of ACM Symp. on Parallel Architectures and Algorithms (SPAA'98),
Pages 5-15, Puerto Vallarta, Mexico, June 1998.
Talk slides.
A long version is the technical report
TRCS98-08.
- C. Fu, X. Jiao, and T. Yang,
Efficient Sparse LU Factorization with Partial Pivoting
on Distributed Memory Architectures.
In IEEE Transactions on Parallel and Distributed Systems, Feb 1998.
- C. Fu, X. Jiao, and T. Yang,
A Comparison of 1-D and 2-D Data
Mapping for Sparse LU Factorization on Distributed Memory Machines.
IN Proc. of 8th SIAM Conference on Parallel Processing for Scientific Computing,
Minneapolis, March 1997.
Talk slides .
- C. Fu and T. Yang,
Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines.
In Proc. of ACM/IEEE SuperComputing'96, Pittsburgh, November 1996.
A long version is the technical report
TRCS96-18.
- C. Fu,
Scheduling and runtime support for Irregular Computations.
Ph.D. Thesis, UCSB, November 1997.
- X. Jiang,
Parallel Sparse Gaussian Elimination with Partial Pivoting
and 2-D Data Mapping,
Master Thesis, UCSB, August 1997.
Current members:
Past members :
- Cong Fu
(PhD, 1997. Currently with Sun Microsystems)
- Xiangmin Jiao
(MS 1997. Currently with UIUC)
- Bin Jiang
(MS 1999. Currently with Environmental Systems Research Institute)
- Steven Richman