Report ID
2003-03
Report Authors
Orhan Camoglu, Tamer Kahveci, and Ambuj Singh
Report Date
Abstract
We consider the problem of finding similarities in proteinstructure databases. Our techniques extract feature vectors on triplets ofSSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. Our first technique finds proteins similar to a query protein in a protein dataset.This technique quickly prunes unpromising proteins using theindex structure. The remaining proteins are then aligned using a popularalignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.
Document
2003-03.pdf184.9 KB