Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

Distributional random oversampling for imbalanced text classification

Conference Paper
Publication Date:
2016
abstract:
The accuracy of many classification algorithms is known to suffer when the data are imbalanced (i.e., when the distribution of the examples across the classes is severely skewed). Many applications of binary text classification are of this type, with the positive examples of the class of interest far outnumbered by the negative examples. Oversampling (i.e., generating synthetic training examples of the minority class) is an often used strategy to counter this problem. We present a new oversampling method specifically designed for classifying data (such as text) for which the distributional hypothesis holds, according to which the meaning of a feature is somehow determined by its distribution in large corpora of data. Our Distributional Random Oversampling method generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. We discuss results we have obtained on the Reuters-21578, OHSUMED-S, and RCV1-v2 datasets.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Distributional semantics; ARTIFICIAL INTELLIGENCE. Learning
List of contributors:
MOREO FERNANDEZ, Alejandro; Esuli, Andrea; Sebastiani, Fabrizio
Authors of the University:
ESULI ANDREA
MOREO FERNANDEZ ALEJANDRO DAVID
SEBASTIANI FABRIZIO
Handle:
https://iris.cnr.it/handle/20.500.14243/320945
Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/320945/105747/prod_356991-doc_159207.pdf
  • Overview

Overview

URL

http://dl.acm.org/citation.cfm?id=2914722&CFID=812657189&CFTOKEN=16638796
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)