Efficient Computation of Long Similar Subsequences

Report ID

2002-09

Report Authors

Abdullah N. Arslan and Omer Egecioglu

Report Date

2002-03-01

Abstract

Given sequences X of length n and Y of length m with n >= m, let LAt* and NLAt* denote the maximum ordinary, and maximum length normalized scores of local alignments with length at least a given threshold value t. The alignment length is defined as the sum of the lengths of the involved subsequences, and length normalized score of an alignmentis the quotient of the ordinary score by the alignment length. We develop an algorithm which finds an alignment with ordinary score >= LAt*, and length >= (1-1/r)t for a given r, in time O(rnm) and space O(rm).The algorithm can be used to find an alignment with length normalized score > lambda for a given positive lambda with the same time and space complexityand within the same approximation bounds. Thus this algorithm provides a length-approximate answer to a query such as \"Do X and Y share a (sufficiently long) fragment with more than 70% of similarity?\" We also show that our approach gives improved approximation algorithms for the normalized local alignment problem. In this case we can efficiently find an alignment with length >=(1-1/r)t which has a length nomalized score >= NLAt*.

Document

2002-09.ps249.61 KB