Images have become an extremely important dataset in many areas of science including biology, geography and astronomy due to their ability to reveal spatial information not immediately available from other data sources. In this paper, we introduce a novel approach, QUIP (QUerying Image Patterns), to retrieve significant spatial patterns from a large collection of such images. Such an ability will provide important clues to the domain scientists regarding the underlying processes that produce those images.
The query pattern of interest is specified as a rectangular region from a tiled image. A scoring formula is designed to discriminate the significant foreground patterns from the irrelevant background of the region. Candidate database regions that match the query are translated into a score matrix of the pairwise aligned tiles. We show that the problem of finding the maximal scoring connected sub-region from the matrix is NP-hard and develop an effective dynamic programming heuristic. To assist the user, each retrieved database pattern is assigned a p-value to indicate its statistical significance. Finally, in order to accelerate QUIP, we adopt the threshold algorithm to efficiently retrieve the candidate database matches and a bounding method to speed up p-value computation.
We experiment with three datasets of microscopy images of retina. For each dataset, the results are significant for the domain scientists. Our method also has practical running time and scales well with database and query sizes.