CIAA2003 START ConferenceManager    

An Optimal Algorithm for Maximum-Sum Segment and Its Application in Bioinformatics

Tsai-Hung Fan, Shufen Lee, Hsueh-I Lu, Tsung-Shan Tsou, Tsai-Cheng Wang, and Adam Yao

Presented at Eighth International Conference on Implementation and Application of Automata (CIAA 2003), July 16-18, 2003 Santa Barbara, CA, USA


Abstract

We study a fundamental sequence algorithm arising from bioinformatics. Given two integers $L$ and $U$ and a sequence $A$ of $n$ numbers, the maximum-sum segment problem is to find a segment $A[i,j]$ of $A$ with $L\leq j-i+1\leq U$ that maximizes $A[i]+A[i+1]+\cdots+A[j]$. The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in $O(n)$ time, based upon a clever technique called {\em left-negative decomposition} for $A$. In the present paper, we present a new $O(n)$-time algorithm that bypasses the left-negative decomposition. As a result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If $A$ is representable in $O(k)$ space in some format, then our algorithm runs in $O(k)$ time. Moreover, practical implementation of our algorithm running on the rice genome helps us to identify a very long repeat structure in rice chromosome 1 that is previously unknown.