Data di Pubblicazione:
2003
Abstract:
This paper presents the implementation of DCI++, an enhancement of DCI, a scalable algorithm for discovering frequent sets in large databases. The main contribution of DCI++ resides on a novel counting inference strategy, inspired by previously known results by Basted et al. Moreover, multiple heuristics and efficient data structures are used in order to adapt the algorithm behavior to the features of the specific dataset mined and of the computing platform used. DCI++ turns out to be effective in mining both short and long patterns from a variety of datasets. We conducted a wide range of experiments on synthetic and real-world datasets, both in-core and out-of-core. The results obtained allow us to state that DCIpp performances are not over-fitted to a special case, and its high performance is maintained on datasets with different characteristics.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Frequent Patterns Mining; Algorithms
Elenco autori:
Orlando, Salvatore; Palmerini, Paolo; Silvestri, Fabrizio; Lucchese, Claudio; Perego, Raffaele
Link alla scheda completa: