Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP
Contributo in Atti di convegno
Data di Pubblicazione:
2010
Abstract:
Using Genetic Programming (GP) for classifying data
streams is problematic as GP is slow compared with traditional single
solution techniques. However, the availability of cheaper and betterperforming
distributed and parallel architectures make it possible to
deal with complex problems previously hardly solved owing to the large
amount of time necessary. This work presents a general framework based
on a distributed GP ensemble algorithm for coping with different types
of concept drift for the task of classification of large data streams. The
framework is able to detect changes in a very efficient way using only a
detection function based on the incoming unclassified data. Thus, only if
a change is detected a distributed GP algorithm is performed in order to
improve classification accuracy and this limits the overhead associated
with the use of a population-based method. Real world data streams
may present drifts of different types. The introduced detection function,
based on the self-similarity fractal dimension, permits to cope in a very
short time with the main types of different drifts, as demonstrated by
the first experiments performed on some artificial datasets. Furthermore,
having an adequate number of resources, distributed GP can handle very
frequent concept drifts.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Elenco autori:
Folino, Gianluigi; Papuzzo, Giuseppe
Link alla scheda completa:
Titolo del libro:
EuroGp 2010