Project Specification: Fault Tolerant Matlab*P Nisheeth Shrivastava Rachit Chawla --------------------------------------------------------------- Matlab*P is is a distributed library for doing scientific computations. The Matlab*P provides an interactive and easy way in which user can manipulate and do intensive computations on large data sets residing on parallel machines. These often take hours to complete which makes it susceptible to network or node failures. In our project, we are going to develop a user-transparent fault tolerant library integrated into Matlab*P which successfully recovers a slave process from its failure. We are going to use a checkpoint based rollback recovery algorithm. A checkpoint will be used to recorded the execution state of slave process periodically. In case of a failure, the failed process will be restarted from the last checkpoint saved to the stable storage so that it does not need to do all the computations done so far. The aim is that the user need not write anything in the program to take care of checkpoint or recovery in case of any failures. The underlying protocol would do it without any explicit interaction from the user. Another objective is to minimize the changes required in the Matlab*P interface provided to the user.