Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

Report ID

2003-16

Report Authors

Ying Feng and Divykant Agrawal and Amr El Abbadi and Ahmed Metwally

Report Date

2003-06-01

Abstract

Data cube computation and representation are prohibitivelyexpensive in terms of time and space. Prior work has focused oneither reducing the computation time or condensing therepresentation of a data cube. In this paper, we introduce RangeCubing as an efficient way to compute and compress the data cubewithout any loss of precision. A new data structure, range trie,is used to compress and identify correlation in attribute values, and compressthe input dataset to effectively reduce the computational cost.The range cubing algorithm generates a compressed cube, calledrange cube, which partitions all cells into disjoint ranges. Eachrange represents a subset of cells with the identical aggregation value as a tuple which hasthe same number of dimensions as the input data tuples.The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real datasetshow a running time of less than one thirtieth, still generating arange cube of less than one ninth of the space of the full cube,when both algorithms run in their preferred dimension orders. Onsynthetic data, range cubing demonstrates much better scalability,as well as higher adaptiveness to both data sparsity and skew.

Document

2003-16.pdf291.4 KB