Maximizing pattern separation in discretizing continuous features for classification purposes
Contributo in Atti di convegno
Data di Pubblicazione:
2010
Abstract:
Discretization is a fundamental phase for many
classification algorithms: it aims at finding a proper set of
cutoffs that subdivide a continuous domain into homogeneous
intervals; the points in each interval should have a high
probability of belonging to the same class. This paper proposes
two different approaches for discretization: the first one consists
in retrieving the optimal set of separation points through
the solution of a proper linear programming problem. Since
the optimal solution may require an excessive computational
burden, an alternative technique, based on the iterative addition
of separation points, is described. The greedy algorithm is
evaluated on some artificial datasets and compared with other
well-known discretization techniques such as EntMDL. The
results of the simulations show the good performances of the
novel algorithm in terms both of accuracy of the solution and
of computational effort required for its generation.
classification algorithms: it aims at finding a proper set of
cutoffs that subdivide a continuous domain into homogeneous
intervals; the points in each interval should have a high
probability of belonging to the same class. This paper proposes
two different approaches for discretization: the first one consists
in retrieving the optimal set of separation points through
the solution of a proper linear programming problem. Since
the optimal solution may require an excessive computational
burden, an alternative technique, based on the iterative addition
of separation points, is described. The greedy algorithm is
evaluated on some artificial datasets and compared with other
well-known discretization techniques such as EntMDL. The
results of the simulations show the good performances of the
novel algorithm in terms both of accuracy of the solution and
of computational effort required for its generation.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
machine learning; discretization; classification problem
Elenco autori:
Ferrari, Enrico; Muselli, Marco
Link alla scheda completa:
Titolo del libro:
Proceedings of the World Congress on Computational Intelligence (WCCI 2010)
Pubblicato in: