Experiments on the use of feature selection and negative evidence in automated text categorization
Contributo in Atti di convegno
Data di Pubblicazione:
2000
Abstract:
We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' « r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of automatically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Content analysis and indexing
Elenco autori:
Sebastiani, Fabrizio
Link alla scheda completa:
Titolo del libro:
Research and Advanced Technology for Digital Libraries 4th European Conference, ECDL 2000 Lisbon, Portugal, September 18-20, 2000. Proceedings