PSI: Indexing Protein Structures for Fast Similarity Search

Report ID

2003-03

Report Authors

Orhan Camoglu, Tamer Kahveci, and Ambuj Singh

Report Date

2003-01-01

Abstract

We consider the problem of finding similarities in proteinstructure databases. Our techniques extract feature vectors on triplets ofSSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. Our first technique finds proteins similar to a query protein in a protein dataset.This technique quickly prunes unpromising proteins using theindex structure. The remaining proteins are then aligned using a popularalignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.

Document

2003-03.pdf184.9 KB