Proc. 11-th Int. Conf. on Parallel and Distributed Computing and Systems (PDCS'99), Nov. 1999, Boston, pp. 194-200.

Ömer Egecioglu and Hakan Ferhatosmanoglu

Circular Data-Space Partitioning for Similarity Queries and Parallel Disk Allocation

Abstract. In a multiple disk environment it is desirable to have techniques for efficient parallel execution of similarity queries. Usually many buckets that may have the query result are needed to be retrieved from secondary storage, which is a costly operation. To achieve efficiency, there are two major factors that need to be considered. These are the number of buckets retrieved by the query, and the degree of parallelism provided by the disk allocation method. In this paper, we develop efficient techniques for parallel similarity searching by optimizing these two factors defined for data-sets that are circular in nature, and similarity queries consisting of query spheres centered at the query point. Our partitioning technique minimizes the expected number of buckets retrieved by a random query among a spectrum of partitioning schemes which have equi-area concentric rings and equi-area central wedges as its two extremes. A simple disk allocation technique for the proposed partitioning method that maximizes the degree of parallelism obtained is also described.

omer@cs.ucsb.edu