Report ID
2008-04
Report Authors
Shyam Antony, Ping Wu, Divyakant Agrawal, Amr El Abbadi
Report Date
Abstract

Aggregation is among the core functionalities of OLAP systems. Frequently, such queries are issued in decision support systems to identify interesting groups of data. When more than one aggregation function is involved and the notion of interest is not clearly defined, skyline queries provide a robust mechanism to capture the potentially interesting points where (i) users do not need to specify a ranking function and (ii) the result is independent of the dimension scales. For providing better exploration functionalities in the OLAP system, in this paper, we propose to use skyline queries over aggregated data to identify the most interesting groups. Since the aggregation function has to be ad-hoc to cover a wide variety of user interests, the skyline over the aggregates has to be computed on the fly. Hence any algorithm to compute such a skyline must be fast and be able to progressively produce the result set with potential skyline groups being produced as early as possible. We explore a family of algorithms which try to consume only as many data records as are necessary to compute the skyline and design an optimal algorithm. We further refine the algorithm by taking into account systems issues such as disk behavior which are often ignored but have strong impact on real system performance. Experimental results validate the performance and progressive benefits of our algorithm.

Document
2008-04.pdf237.67 KB