Efficient Skyline Computation over Ad-hoc Aggregations

Report ID

2008-04

Report Authors

Shyam Antony, Ping Wu, Divyakant Agrawal, Amr El Abbadi

Report Date

2008-04-01

Abstract

Aggregation is among the core functionalities of OLAP systems. Frequently, such queries are issued in decision support systems to identify interesting groups of data. When more than one aggregation function is involved and the notion of interest is not clearly deﬁned, skyline queries provide a robust mechanism to capture the potentially interesting points where (i) users do not need to specify a ranking function and (ii) the result is independent of the dimension scales. For providing better exploration functionalities in the OLAP system, in this paper, we propose to use skyline queries over aggregated data to identify the most interesting groups. Since the aggregation function has to be ad-hoc to cover a wide variety of user interests, the skyline over the aggregates has to be computed on the ﬂy. Hence any algorithm to compute such a skyline must be fast and be able to progressively produce the result set with potential skyline groups being produced as early as possible. We explore a family of algorithms which try to consume only as many data records as are necessary to compute the skyline and design an optimal algorithm. We further reﬁne the algorithm by taking into account systems issues such as disk behavior which are often ignored but have strong impact on real system performance. Experimental results validate the performance and progressive beneﬁts of our algorithm.

Document

2008-04.pdf237.67 KB