CS240 Assignment 0
Fenglin Liao
1st year PhD student of Computer Science Department. Interested in Data Management and Distributed System. Eager to improve my programming skills through different courses. My goal of taking this class is to get myself familiar with MPI and UPC, and learn how to design efficient parallel algorithms for different appllications.
Accelerating comparative genomics using parallel computing
What is the scientific or engineering problem being solved?
Along with the rapid increase in the number of completely sequenced genomes, enormous biological sequence data floods into the sequence database, making it necessary to develop efficient tools for comparative genome sequence analysis.Although there are various sequence analysis tools available, in which FASTA and Smith-Waterman algorithms are implemented, it seems impractical to analyze large datasets of genome sequences using above tools on uniprocessor machines. So the objective is to improve the performance of the above popular sequence analysis tools on parallel cluster computers.
The challenges come from mainly two aspects. One is the increasing length of the sequences queries and the other one is the increasing size of the database. The original sequential tools gains low performance when the queries' lengths and the datasets' size are increasing. Hence we need to use multiprocessor to improve the performance.
The solution to this problem is to use parallel computing approach to develop the FASTA and Smith-Waterman algorithms on the PARAM sysmtem, which is a cluster of Sun Ultra e450 workstation.
How well did the application achieve its scientific / engineering objective? Are simulation results compared to physical results?
Performance of FASTA and Smith-Waterman algorithms on PARAM was studied by two approaches, one by tuning the codes using message-passing libraries and the other by using low-level parallelism with a distributed computing approach.
The former approach is a master-worker approach in which one processor acts as master and the other processors as workers. The master processor distributes the searching job to other "n-1" workers without doing the search by itselft. The speedup was calculated by dividing the elapsed time for a run using two processors (only a single worker, as the master does not perform a database search) with the elapsed time required by the 'n' processors ('n-1' workers). The scalability, which is the ability to yield good performance with an increasing number of processors, was studied by calculating speedup values for the above algorithms. In the case of the distributed computing approach the server also performs the database search, and the speedup was calculated as the time taken by a single processor divided by the time taken by 'n' processors.
Performance of parallel SSEARCH
Performance of parallel FASTA
Performance of SSEARCH and FASTA using a task-distribution system
|
Figure 5: Speedup curves for all-against-all database comparisons using Smith-Waterman and FASTA on 128 processors (distributed computing approach). |
The above study indicates that good speedups can
be seen when algorithms are more compute intensive. It also shows that the use
of TCP/IP sockets with client/server communication models in a loosely coupled
system architecture can be more beneficial when parallel clusters and parallel
implementation of the codes are not available. Scientists without much knowledge
of parallel implementations can make use of all the available machines connected
to a Local Area Network (LAN) to perform huge database searches using sequence
analysis codes.
กก
What type of parallel platform was the application developed for? (distributed vs. shared memory, vector, etc.) What tools were used to build the application? (languages, libraries, etc.)
The platform is a shared memory model. All the performance tests were done on PARAM 10000, a parallel cluster of workstations designed and developed in-house. the two parallel codes were tuned to the PARAM system using the MPICH implementation of standard portable message-passing interface model (MPI version 1.1).
If the application is run on a major supercomputer, where does that computer rank on the Top 500 list?
The application has not been run on a major supercomputer.กก
How well did the application perform? How does this compare to the platform's best possible performance?
Through all the experiments, we can see that the long genome sequences could be effectively searched using the parallel S-W algorithm on PARAM, and the optimized codes implemented on PARAM can effectively be used to search large databases. However, the interconnections between different processors play an important role in the performance too. Once over some limit, the information exchange causes overhead, making the performance gain not linear to the number of processors.
Does the application "scale" to large problems on many processors? If you believe it has not, what bottlenecks may have limited its performance? กก
All the benchmarks are using at most 64 processors. The communication between processors might be the bottleneck.