Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Distributional random oversampling for imbalanced text classification

Contributo in Atti di convegno
Data di Pubblicazione:
2016
Abstract:
The accuracy of many classification algorithms is known to suffer when the data are imbalanced (i.e., when the distribution of the examples across the classes is severely skewed). Many applications of binary text classification are of this type, with the positive examples of the class of interest far outnumbered by the negative examples. Oversampling (i.e., generating synthetic training examples of the minority class) is an often used strategy to counter this problem. We present a new oversampling method specifically designed for classifying data (such as text) for which the distributional hypothesis holds, according to which the meaning of a feature is somehow determined by its distribution in large corpora of data. Our Distributional Random Oversampling method generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. We discuss results we have obtained on the Reuters-21578, OHSUMED-S, and RCV1-v2 datasets.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Distributional semantics; ARTIFICIAL INTELLIGENCE. Learning
Elenco autori:
MOREO FERNANDEZ, Alejandro; Esuli, Andrea; Sebastiani, Fabrizio
Autori di Ateneo:
ESULI ANDREA
MOREO FERNANDEZ ALEJANDRO DAVID
SEBASTIANI FABRIZIO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/320945
Link al Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/320945/105747/prod_356991-doc_159207.pdf
  • Dati Generali

Dati Generali

URL

http://dl.acm.org/citation.cfm?id=2914722&CFID=812657189&CFTOKEN=16638796
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)