CS240A HW#3: README =================== For this exercise, you have to implement a parallel version of the Conjugate Gradient (CG) method to solve a sparse linear system of the form Ax = b. There is a MATLAB solution included in the harness which you can use as a reference. The program (harness3.c) takes 2 command-line arguments: Matrix size (n) and the "type" of input matrix (whichstart). Given these values, the harness will generate a sparse matrix (A) and distribute it among the the parallel threads of your program. The matrix is distributed using block row decomposition so that each thread will get n/p rows of the matrix, where p = total number of processors. When n is not evenly divisible by p, each thread except for the last one will get floor(n/p) rows and the last thread will get the rest. For e.g., for n = 10 and p = 3, thread #0 will get the first 3 rows, thread #1 will get the next 3 and thread #2 will get the last 4 rows. The harness will call your cgsolve_ function in each thread with the corresponding "chunk" of A. The right-hand-side vector (b) of the linear system is assumed to be of the form [1 2 3 ... n] and you can generate it yourself. The harness can generate two "types" of matrices that you can use as an input matrix (A): whichstart = 0 will generate an Identity matrix whichstart = 1 will generate a matrix corresponding to the 3D Model Problem You can set the values of n and whichstart in the Make.inc file. You need to implement the cgsolve_ function in upccg.c (for UPC) or mpicg.c (for MPI) depending on which group you are assigned to in the class (select the compilation target in the Make.inc file). The function has the following parameters: n: The size of the matrix start, stop: The start and end indices of the rows of A assigned to the thread x, y, v: The non-zero elements in the rows of A assigned to the thread. Vectors x, y and v hold the row indices, column indices and values of the non-zero elements respectively. xysize: The length of the vectors x, y and v (or the total number of non-zeros) niters: The number of iterations that CG took to converge to a solution norm1, norm2: The 1-norm and 2-norm of the solution vector niters, norm1 and norm2 are output arguments - you need to set them to your computed values so that the harness can validate the correctness of your solution. Note that the validation is not available for all possible values of n in case of the 3D Model problem (whichstart=1). In that case, the harness will print "CORRECT: -1" in the output. Refer to the function val1() in validate3.h for the appropriate values of n and also to learn more about how the validation is done. Instructions for compiling & running code ----------------------------------------- At the top of the Make.inc file, you can select the target (Serial, UPC or MPI), the number of nodes and tasks/node to run and the problem parameters (n, whichstart). To compile: make upc (OR make mpi for MPI OR make serial for Serial) To run interactively: make runupc (OR make runmpi for MPI or make runserial for Serial) To run using the batch queue: make runupcq (OR make runmpiq for MPI) To clean-up object files and executables: make clean To update the harness: make update Updates/bug-fixes to the harness will be announced on the mailing list. Note: The serial version is not required as part of the homework but it might help if you develop the serial code first followed by the parallel version. You can debug your serial code with the dbx debugger (similar to gdb) on DataStar.