Data di Pubblicazione:
2010
Abstract:
Clustering is one of the most important unsupervised learning problems and it deals with finding a structure in a collection of unlabeled data; however, different clustering algorithms applied to the same data-set produce different solutions. In many applications the problem of multiple solutions becomes crucial and providing a limited group of good clusterings is often more desirable than a single solution. In this work we propose the Least Square Consensus clustering that allows a user to extrapolate a small number of different clustering solutions from an initial (large) set of solutions obtained by applying any clustering algorithm to a given data-set. Two different implementations are presented. In both cases, each consensus is accomplished with a measure of quality defined in terms of Least Square error and a graphical visualization is provided in order to make immediately interpretable the result. Numerical experiments are carried out on both synthetic and real data-sets. © 2010 Springer-Verlag.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Clustering solutions; Clusterings; Consensus algorithms; Consensus clustering; Data sets; Graphical visualization; Least Square; Least square errors; Multiple clusterings; Multiple solutions; Numerical experiments; Synthetic and real data; Unlabeled data; Artificial intelligence; Bioinformatics; Cluster analysis; Visualization; Clustering algorithms
Elenco autori:
DE FEIS, Italia; Angelini, Claudia
Link alla scheda completa: