The assignment consists of writing a routine which does the power method. Given an nxn matrix A, and an nx1 vector b, their product A*b can be found in parallel. Recall that the product of a matrix and a vector gives another vector. We will be doing this multiplication k times, and before every multiplication, we scale the result vector by its L2 norm [1] so you will do: do k times { b = b / norm(b); b = A * b; } The skeleton code given provides the framework to which you should adhere. You have to write three functions. Write these in a separate file called hw1-mpi.c, hw1-omp.c, hw1-upc.c, or hw1-mpi.F90, hw1-caf.F90, so the skeleton is just a driver program, which calls your routine. This will also ensure that we will be able to call your functions properly when testing for correctness. void power_distribute(double *A, double *b, long n); The power_distribute function takes as arguments the matrix A, the vector b, and tells you what n is. The matrix is given in column major order, and will be an array of n*n elements. For a 3x3 matrix: (1 4 7 2 5 8 3 6 9) A will be an array of doubles, containing the values (1, 2, 3, ..., 9) A[0] = 1; A[1] = 2; ... A[8] = 9 The purpose of this function is to distribute your data as you would require for your parallel computation. Nothing is returned. Since A is visible as a long vector instead of a matrix, you need the value of 'n' so you know what the dimensions of the matrix are. void power_method(long k); In the power_method function, you will be required to implement the power method in parallel. The number of iterations is passed as the argument k. A single step of the method firsts scales the vector b so that its L2 norm is one. This is done by dividing every element of the vector by the norm of the vector. Then, we multiply A and b, to get the new vector. This step has to be done k times. In pseudo-Matlab code, this is the following: do k times { b[1:n] = b[1:n] / norm(b); b = A*b } The result vector b will need to be distributed to other CPUs, so that the the next step can read the right values of the array b. double power_answer(void); In the power_answer function, you return the L2 norm of the vector b. All norms are L2 norms, as explained in [1]. Your matrix A will have to be distributed, and most probably b as well. After doing a single multiplication step, the values of b will need to be updated on the processors, and so you will have to do some sort of sync'ing to get the new values of b on the processors. Implement the above in OpenMP (C or Fortran), MPI (C or Fortran), Matlab *P, and either Co-Array Fortran or UPC. For UPC, OpenMP in C, and MPI in C, use the framework given above. For Co-Array Fortran, OpenMP and MPI in Fortran, make a similar framework. In Matlab *P, write the three functions as Matlab functions. Include your source code, and plots showing the speedup obtained. Draw graphs with number of processors on the X axis, and the wallclock time on the Y-axis, for all four programs. [1]. L2 Norm of a vector (x_1, x_2, ..., x_n) is defined as: sqrt ( x_1*x_1 + x_2*x_2 + ... + x_n*x_n )