Data di Pubblicazione:
2007
Abstract:
In this work, PFCNN, a distributed method for computing a consistent
subset of very large data sets for the nearest neighbor decision rule is presented.
In order to cope with the communication overhead typical of distributed
environments and to reduce memory requirements, different variants of the basic
PFCNN method are introduced. Experimental results, performed on a class of
synthetic datasets revealed that these methods can be profitably applied to enormous
collections of data. Indeed, they scale-up well and are efficient in memory
consumption and achieve noticeable data reduction and good classification accuracy.
To the best of our knowledge, this is the first distributed algorithm for
computing a training set consistent subset for the nearest neighbor rule.
Tipologia CRIS:
01.01 Articolo in rivista
Elenco autori:
Folino, Gianluigi; Angiulli, Fabrizio
Link alla scheda completa: