Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Discretizing continuous attributes in AdaBoost for text categorization

Contributo in Atti di convegno
Data di Pubblicazione:
2003
Abstract:
We focus on two recently proposed algorithms in the family of "boosting"-based learners for automated text classification, AdaBoost. MH and AdaBoost.MHKR. While the former is a realization of the well-known AdaBoost algorithm speci.cally aimed at multi-label text categorization, the latter is a generalization of the former based on the idea of learning a committee of classifier sub-committees. Both algorithms have been among the best performers in text categorization experiments so far. A problem in the use of both algorithms is that they require documents to be represented by binary vectors, indicating presence or absence of the terms in the document. As a consequence, these algorithms cannot take full advantage of the "weighted" representations (consisting of vectors of continuous attributes) that are customary in information retrieval tasks, and that provide a much more significant rendition of the document's content than binary representations.In this paper we address the problem of exploiting the potential of weighted representations in the context of AdaBoost-like algorithms by discretizing the continuous attributes through the application of entropybased discretization methods. We present experimental results on the Reuters-21578 text categorization collection, showing that for both algorithms the version with discretized continuous attributes outperforms the version with traditional binary representations.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Text categorization
Elenco autori:
Sebastiani, Fabrizio
Autori di Ateneo:
SEBASTIANI FABRIZIO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/79596
Titolo del libro:
Advances in Information Retrieval
  • Dati Generali

Dati Generali

URL

http://link.springer.com/chapter/10.1007%2F3-540-36618-0_23
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)