CS140 Final Project: Conjugate Gradient Solver (MPI)

The object of this project is to write a parallel subroutine to multiply a sparse matrix by a dense vector (yielding another dense vector), and then to use that "matvec" routine in a routine to solve a symmetric positive definite system of linear equations by the conjugate gradient method, also known as "CG". This is an MPI project, because many of the scientific applications of sparse linear equation systems need to solve truly enormous systems that would be too big for the current generation of multicore processors. (This will change as the number of cores per processor increases over the next few years.)

CG is a spiffy algorithm for solving a linear equation system of the form Ax = b. It's newer and usually a lot faster than the Jacobi relaxation or SOR methods described in the book, provided the matrix A is symmetric and has positive eigenvalues. The theory behind CG, while mathematically very pretty, is too far afield from this course for us to go into. (If you want to go into the theory here is a good description.)

All you need to know for this project is what the algorithm does, which is pretty simple. CG is an iterative method, which means that it starts out with a guess at the value of x (usually you start with the vector of all zeros), and then it does "iterations", one after another, improving the guess at each one, until the guess is close enough. The main work of an iteration involves the multiplication of one (dense) vector by the (sparse) matrix A, and two computations of the dot product of two vectors. The algorithm is outlined on page 3 of these slides. Also, there is a sequential Matlab code for CG here.

The matvec routine is partly an exercise in data structures for irregular, sparse objects. You'll use a compressed row data structure that can store any sparse matrix in space proportional to the number of nonzero elements in the matrix.

There is a sequential Matlab code on the course web site. The Matlab code includes a harness with a data generator and validator, which you can use to check the correctness of your code. The data generator can produce an identity matrix, or the matrix whose graph is a regular k-by-k-by-k grid corresponding to the discretized Poisson equation in three dimensions. The Matlab routine "cgsolve.m" is the conjugate gradient solver. The sparse matvec is built into Matlab; it is just the line "Ad = A*d;"

You will generate the matrix data distributed over the processors by rows, with n/p rows of an n-by-n matrix on each of p processors. You may rearrange the matrix data if you wish in order to make the sequence of matvecs faster.

Generating realistic data in parallel can be a bit tricky. You should first implement and test the trivial case of a diagonal matrix, for example with A(i,i) = i for all i, just as a reality check. Then you should write a parallel generator for the "model problem", which is the k-by-k 2D grid of n=k^2 equations in n=k^2 unknowns. Every row of the matrix has a +4 on the diagonal, and most of the rows also have four -1's, one place before and after the diagonal and k places before and after the diagonal. However, the k equations for points on the leftmost side of the grid are missing the -1 one place before the diagonal; the k equations for the rightmost points are missing the -1 one place after the diagonal; and the k equations for top points and the k equations for bottom points are missing the -1 k places before or k places after the diagonal. It's not too hard to write code for each processor to generate its own n/p rows of this matrix, checking the special cases, but you'll want to debug carefully and print lots of examples to make sure you're getting the right matrix. You can assume that k is divisible by p if that makes it easier. Also, you're welcome to come talk to us about various ways of getting sample data for this project.

The data distribution and communication for this problem are up to you. You will probably want to think about not communicating more of the vector than necessary in the matvec. You may want to experiment with rearranging the rows and columns of the matrix to reduce the amount of communication; for fairly small matrices this is unlikely to help but it might be useful for the biggest matrices you can run. You should be able to do pretty large problems on a parallel machine; on my laptop, the sequential Matlab code takes about 2 minutes for a 1,000,000-by-1,000,000 matrix.