|
CIAA2003 |
START ConferenceManager |
An Optimal Algorithm for Maximum-Sum Segment and Its Application in Bioinformatics
Tsai-Hung Fan, Shufen Lee, Hsueh-I Lu, Tsung-Shan Tsou, Tsai-Cheng Wang, and Adam Yao
Presented at
Eighth International Conference on Implementation and Application of
Automata (CIAA 2003),
July 16-18, 2003
Santa Barbara, CA, USA
Abstract
We study a fundamental sequence algorithm arising from bioinformatics. Given two integers $L$ and $U$ and a sequence $A$ of $n$ numbers,
the maximum-sum segment problem is to find a segment $A[i,j]$ of $A$ with $L\leq j-i+1\leq U$ that maximizes $A[i]+A[i+1]+\cdots+A[j]$.
The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular
sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in $O(n)$ time, based upon a clever technique called {\em left-negative
decomposition} for $A$. In the present paper, we present a new $O(n)$-time algorithm that bypasses the left-negative decomposition. As a
result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with
genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If $A$ is representable in $O(k)$ space in some
format, then our algorithm runs in $O(k)$ time. Moreover, practical implementation of our algorithm running on the rice genome helps us to
identify a very long repeat structure in rice chromosome 1 that is previously unknown.