Data di Pubblicazione:
2012
Abstract:
Sizes of compressed bitmap indexes and compressed data are significantly affected by the order of data records. The optimal orders of rows and columns that minimizes the index sizes is known to be NP-hard to compute. Instead of seeking the precise global optimal ordering, we develop accurate statistical formulas that compute approximate solutions. Since the widely used bitmap indexes are compressed with variants of the run-length encoding (RLE) method, our work concentrates on computing the sizes of bitmap indexes compressed with the basic Run-Length Encoding. The resulting formulas could be used for choosing indexes to build and to use. In this paper, we use the formulas to develop strategies for reordering rows and columns of a data table. We present empirical measurements to show that our formulas are accurate for a wide range of data. Our analysis confirms that the heuristics of sorting columns with low column cardinalities first is indeed effective in reducing the index sizes. We extend the strategy by showing that columns with the same cardinality should be ordered from high skewness to low skewness.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Bitmap indexes; Optimal ordering; Run-length encoding; Empirical measurement; NP-hard
Elenco autori:
POURABBAS DOLATABAD, Elaheh
Link alla scheda completa:
Titolo del libro:
24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012